1b:["$","$L29",null,{"isWhiteLabelled":false,"children":["$","$Lb",null,{"pt":{"compact":0,"expanded":3},"children":[["$","$L2a",null,{"noStar":true,"publisher":true,"task":true,"params":true,"size":"xl","product":{"id":"eyJwYXBlcklEIjoiMTgwNi4wNzEwNCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","updated":"2018-06-19T08:42:45.000Z","paperID":"1806.07104","published":"2018-06-19T08:42:45.000Z","authors":"[\"Alon Cohen\",\"Avinatan Hassidim\",\"Tomer Koren\",\"Nevena Lazic\",\"Yishay Mansour\",\"Kunal Talwar\"]","title":"Online Linear Quadratic Control","scoreTrending":null,"summary":"We study the problem of controlling linear time-invariant systems with known\nnoisy dynamics and adversarially chosen quadratic losses. We present the first\nefficient online learning algorithms in this setting that guarantee\n$O(\\sqrt{T})$ regret under mild assumptions, where $T$ is the time horizon. Our\nalgorithms rely on a novel SDP relaxation for the steady-state distribution of\nthe system. Crucially, and in contrast to previously proposed relaxations, the\nfeasible solutions of our SDP all correspond to \"strongly stable\" policies that\nmix exponentially fast to a steady state.","lastCheckedForCode":"2022-09-05T00:11:38.482Z","links":[{"id":"eyJ1cmwiOiJodHRwczovL3BhcGVyc3dpdGhjb2RlLmNvbS9wYXBlci9vbmxpbmUtbGluZWFyLXF1YWRyYXRpYy1jb250cm9sIn0=","type":"pwc","url":"https://paperswithcode.com/paper/online-linear-quadratic-control","data":"{\"date\":\"2024-09-04T20:15:24.780Z\"}"}],"reposConnection":{"edges":[]},"models":[],"tags":[],"summaries":[{"model":"gpt-4o-mini","header":"paper.summary.expertise.beginner","summary":"This research paper explores how to control systems (like heating or cooling systems) that have noisy rules and changing costs. The authors created two smart algorithms that learn how to adapt to these changing conditions over time while keeping costs low. They used a math technique called semidefinite programming to help find the best control strategies. The algorithms showed good results, especially when applied to a real data center's cooling system, demonstrating that they could quickly adjust to changes in energy costs while maintaining efficient operation."}],"emailsConnection":{"edges":[]},"__typename":"paper","authorArray":["Alon Cohen","Avinatan Hassidim","Tomer Koren","Nevena Lazic","Yishay Mansour","Kunal Talwar"]}}],["$","$L18",null,{"container":true,"columns":100,"spacing":{"compact":0,"expanded":2,"large":3},"children":[["$","$L18",null,{"size":{"compact":100,"expanded":100,"large":68},"children":[["$","$7",null,{"children":["$","$L2b",null,{"publisher":"arxiv","paperID":"1806.07104","product":{"paper":"$1b:props:children:props:children:0:props:product","models":"$1b:props:children:props:children:0:props:product:models"},"isWhiteLabelled":false}]}],["$","$7",null,{"children":["$","$L2c",null,{"article":"$L2d","model":"$undefined"}]}]]}],["$","$L18",null,{"size":"grow","children":["$","$L2e",null,{}]}]]}],["$","$7",null,{"children":null}],[["$","audio",null,{"id":"tts"}],["$","$L2f",null,{"paperID":"1806.07104","publisher":"arxiv","paperJSON":{"title":"Online Linear Quadratic Control","paperID":"1806.07104","avgLineHeight":11.92,"imgScale":4,"sections":[{"heading":"Abstract","paragraphs":[[{"text":"We study the problem of controlling linear time-invariant systems with known noisy dynamics and adversarially chosen quadratic losses. We present the first efficient online learning algorithms in this setting that guarantee ","element":"span"},{"style":{"height":16.55},"width":114.58,"height":41.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/0-0.png","element":"img","alt":" O(√T)","inline":true,"padRight":true},{"text":"regret under mild assumptions, where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"is the time horizon. Our algorithms rely on a novel SDP relaxation for the steady-state distribution of the system. Crucially, and in contrast to previously proposed relaxations, the feasible solutions of our SDP all correspond to “strongly stable” policies that mix exponentially fast to a steady state.","element":"span"}]]},{"heading":"1 Introduction","paragraphs":[[{"text":"Linear-quadratic (LQ) control is one of the most widely studied problems in control theory ","element":"span"},{"href":"#id-0","referenceIndex":6,"text":"(Anderson ","element":"a"},{"href":"#id-0","referenceIndex":6,"text":"et al., ","element":"a"},{"href":"#id-0","referenceIndex":6,"text":"1972; ","element":"a"},{"href":"#id-1","referenceIndex":11,"text":"Bertsekas, ","element":"a"},{"href":"#id-1","referenceIndex":11,"text":"1995; ","element":"a"},{"href":"#id-2","referenceIndex":36,"text":"Zhou et al., ","element":"a"},{"href":"#id-2","referenceIndex":36,"text":"1996)","element":"a"},{"text":". It has been applied successfully to problems in statistics, econometrics, robotics, social science and physics. In recent years, it has also received much attention from the machine learning community, as increasingly difficult control problems have led to demand for data-driven control systems ","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"(Abbeel et al., ","element":"a"},{"href":"#id-3","referenceIndex":4,"text":"2007; ","element":"a"},{"href":"#id-4","referenceIndex":27,"text":"Levine et al., ","element":"a"},{"href":"#id-4","referenceIndex":27,"text":"2016; ","element":"a"},{"href":"#id-5","referenceIndex":32,"text":"Sheckells et al., ","element":"a"},{"href":"#id-5","referenceIndex":32,"text":"2017)","element":"a"},{"text":".","element":"span"}],[{"text":"In LQ control, both the state and action are real-valued vectors. The dynamics of the environment are linear in the state and action, and are perturbed by Gaussian noise. The cost is quadratic in the state and control (action) vectors. The optimal control policy, which minimizes the cost, selects the control vector as a linear function of the state vector, and can be derived by solving the algebraic Ricatti equations.","element":"span"}],[{"text":"The main focus of this work is control of linear systems whose quadratic costs vary in an unpredictable way. This problem may arise in settings such as building climate control in the presence of time-varying energy costs, due to energy auctions or unexpected demand fluctuations. To measure how well a control system adapts to time-varying costs, it is common to consider the notion of regret: the difference between the total cost of the controller, one that is only aware of previously observed costs, and that of the best fixed control policy in hindsight. This notion has been thoroughly studied in the context of online learning, and particularly in that of online convex optimization ","element":"span"},{"href":"#id-6","referenceIndex":15,"text":"(Cesa-Bianchi & Lugosi, ","element":"a"},{"href":"#id-6","referenceIndex":15,"text":"2006; ","element":"a"},{"href":"#id-7","referenceIndex":21,"text":"Hazan, ","element":"a"},{"href":"#id-7","referenceIndex":21,"text":"2016; ","element":"a"},{"href":"#id-8","referenceIndex":31,"text":"Shalev-Shwartz, ","element":"a"},{"href":"#id-8","referenceIndex":31,"text":"2012)","element":"a"},{"text":". LQ control was considered in the context of regret by ","element":"span"},{"href":"#id-9","referenceIndex":3,"text":"Abbasi-Yadkori et al. ","element":"a"},{"href":"#id-9","referenceIndex":3,"text":"(2014)","element":"a"},{"text":", who give a learning algorithm for the problem of tracking an adversarially changing target in a system with noiseless linear dynamics.","element":"span"}],[{"text":"In this paper we consider online learning with fixed, known, linear dynamics and adversarially chosen quadratic cost matrices. Our main results are two online algorithm that achieve ","element":"span"},{"style":{"height":18.3},"width":125.34,"height":45.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/0-1.png","element":"img","alt":" O(√T)","inline":true,"padRight":true},{"text":"regret, when comparing to any fast mixing linear policy.","element":"span"},{"style":{"height":7.6},"width":16,"height":19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/0-2.png","element":"img","alt":"1","inline":true,"padRight":true},{"text":"One of our online algorithms is based on Online Gradient Descent of ","element":"span"},{"href":"#id-10","referenceIndex":37,"text":"Zinkevich ","element":"a"},{"href":"#id-10","referenceIndex":37,"text":"(2003)","element":"a"},{"text":". The other is based on Follow the Lazy Leader of ","element":"span"},{"href":"#id-11","referenceIndex":24,"text":"Kalai & Vempala ","element":"a"},{"href":"#id-11","referenceIndex":24,"text":"(2005)","element":"a"},{"text":", a variant of Follow the Perturbed Leader with only ","element":"span"},{"style":{"height":18.3},"width":125.03,"height":45.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/0-3.png","element":"img","alt":" O(√T)","inline":true,"padRight":true},{"text":"expected number of policy switches.","element":"span"}],[{"text":"Overall, our approach follows ","element":"span"},{"href":"#id-12","referenceIndex":18,"text":"Even-Dar et al. ","element":"a"},{"href":"#id-12","referenceIndex":18,"text":"(2009)","element":"a"},{"text":". We first show how to perform online learning in an “idealized setting”, a hypothetical setting in which the learner can immediately observe the steady-state cost of any chosen control policy. We proceed to bound the gap between the idealized costs and the actual costs.","element":"span"}],[{"text":"Our technique is conceptually different to most learning problems: instead of predicting a policy and observing its steady-state cost, the learner predicts a steady-state distribution and derives from it a corresponding policy. Importantly, this view allows us to cast the idealized problem as a semidefinite program which minimizes the expected costs as a function of a steady state distribution (of both states and controls). As the problem is now convex, we apply OGD and FLL to the SDP and argue about fast-mixing properties of its feasible solutions.","element":"span"}],[{"text":"For online gradient descent, we define a “sequential strong stability” property that couples consecutive control matrices, and show that it guarantees that the observed state distributions closely track those generated in the idealized setting. We then show that the sequence of policies generated by the online gradient descent algorithm satisfies this property. In Follow the Lazy Leader, following each switch our algorithm resets the system—a process that takes a constant number of rounds, after which the cost of playing the new policy is less than its steady-state cost.","element":"span"}],[{"text":"The holy grail of reinforcement learning is controlling a dynamical stochastic system under uncertainty, and clearly both MDPs and LQ control are well within this mission statement. There are obvious differences between the two models: MDPs model discrete state and action dynamics while LQ control addresses continuous linear dynamics with a quadratic cost. In this work we are inspired by methodologies from online-MDP and regret minimization to derive new results for LQ control. We believe that exploring the interface between the two will be fruitful for both sides, and holds significant potential for future RL research agenda.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Related Work","element":"span"}],[{"text":"LQ control can be seen as a continuous analogue of the discrete Markov Decision Process (MDP) model. As such, our results are conceptually similar to those of ","element":"span"},{"href":"#id-12","referenceIndex":18,"text":"Even-Dar et al. ","element":"a"},{"href":"#id-12","referenceIndex":18,"text":"(2009)","element":"a"},{"text":", who derive regret bounds for MDPs with known dynamics and changing rewards. However, our technical approach and the derivation of our algorithms are very different than those applicable in context of MDPs.","element":"span"}],[{"text":"Among the many follow-up works to ","element":"span"},{"href":"#id-12","referenceIndex":18,"text":"Even-Dar et al. ","element":"a"},{"href":"#id-12","referenceIndex":18,"text":"(2009)","element":"a"},{"text":", let us note ","element":"span"},{"href":"#id-13","referenceIndex":35,"text":"Yu et al. ","element":"a"},{"href":"#id-13","referenceIndex":35,"text":"(2009) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-14","referenceIndex":1,"text":"Abbasi ","element":"a"},{"href":"#id-14","referenceIndex":1,"text":"et al. ","element":"a"},{"href":"#id-14","referenceIndex":1,"text":"(2013) ","element":"a"},{"text":"that propose lazy algorithms similar to our second algorithm. We remark that, compared to our ","element":"span"},{"style":{"height":18.3},"width":125.17,"height":45.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/1-0.png","element":"img","alt":" O(√T)","inline":true,"padRight":true},{"text":"regret bounds, ","element":"span"},{"href":"#id-9","referenceIndex":3,"text":"Abbasi-Yadkori et al. ","element":"a"},{"href":"#id-9","referenceIndex":3,"text":"(2014) ","element":"a"},{"text":"give an ","element":"span"},{"style":{"height":18.73},"width":167.96,"height":46.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/1-1.png","element":"img","alt":" O(log2 T)","inline":true,"padRight":true},{"text":"regret bound under much stronger assumptions.","element":"span"},{"href":"#id-15","style":{"height":7.6},"width":16,"height":19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/1-2.png","element":"img","alt":"2","inline":true,"padRight":true},{"text":"Similar bounds are established by ","element":"span"},{"href":"#id-16","referenceIndex":29,"text":"Neu & Gómez ","element":"a"},{"href":"#id-16","referenceIndex":29,"text":"(2017) ","element":"a"},{"text":"for online learning in linearly solvable MDPs, that were shown to capture appropriately discretized versions of LQ control systems ","element":"span"},{"href":"#id-17","referenceIndex":33,"text":"(Todorov, ","element":"a"},{"href":"#id-17","referenceIndex":33,"text":"2009)","element":"a"},{"text":". In light of these results, it is interesting to investigate whether our bounds are tight or can actually be improved. We leave this investigation for future work.","element":"span"}],[{"text":"An orthogonal line of research that has gained popularity in recent years is controlling linear quadratic systems with unknown fixed dynamics. The majority of recent papers deal with off-policy learning: either by policy gradient ","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"(Fazel et al., ","element":"a"},{"href":"#id-18","referenceIndex":19,"text":"2018)","element":"a"},{"text":"; by estimating the transition matrices ","element":"span"},{"href":"#id-19","referenceIndex":16,"text":"(Dean et al., ","element":"a"},{"href":"#id-19","referenceIndex":16,"text":"2017)","element":"a"},{"text":"; or by improper learning ","element":"span"},{"href":"#id-20","referenceIndex":22,"text":"(Hazan et al., ","element":"a"},{"href":"#id-20","referenceIndex":22,"text":"2017; ","element":"a"},{"href":"#id-21","referenceIndex":7,"text":"Arora et al., ","element":"a"},{"href":"#id-21","referenceIndex":7,"text":"2018)","element":"a"},{"text":". In contrast to that, ","element":"span"},{"href":"#id-22","referenceIndex":2,"text":"Abbasi-Yadkori & ","element":"a"},{"href":"#id-22","referenceIndex":2,"text":"Szepesvári ","element":"a"},{"href":"#id-22","referenceIndex":2,"text":"(2011) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-23","referenceIndex":23,"text":"Ibrahimi et al. ","element":"a"},{"href":"#id-23","referenceIndex":23,"text":"(2012) ","element":"a"},{"text":"present an on-policy learning algorithm with ","element":"span"},{"style":{"height":18.29},"width":249.54,"height":45.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/1-3.png","element":"img","alt":" O(√T) regret.","inline":true}],[{"text":"Semidefinite programming for LQ control has been previously used ","element":"span"},{"href":"#id-24","referenceIndex":10,"text":"(Balakrishnan & Vandenberghe, ","element":"a"},{"href":"#id-24","referenceIndex":10,"text":"2003; ","element":"a"},{"href":"#id-25","referenceIndex":17,"text":"Dvijotham et al., ","element":"a"},{"href":"#id-25","referenceIndex":17,"text":"2013; ","element":"a"},{"href":"#id-26","referenceIndex":25,"text":"Lee & Hu, ","element":"a"},{"href":"#id-26","referenceIndex":25,"text":"2016)","element":"a"},{"text":", mostly in the context of infinite-horizon constrained LQRs ","element":"span"},{"href":"#id-27","referenceIndex":26,"text":"(Lee & Khargonekar, ","element":"a"},{"href":"#id-27","referenceIndex":26,"text":"2007; ","element":"a"},{"href":"#id-28","referenceIndex":30,"text":"Schildbach et al., ","element":"a"},{"href":"#id-28","referenceIndex":30,"text":"2015)","element":"a"},{"text":". In many of these formulations, one has to solve the SDP exactly to obtain a stabilizing solution; in other words, only the optimal policy is known to be stable and suboptimal policies need not be stabilizing. This is not the case in our SDP formulation, as any feasible solution is not only stable but, in fact, strongly-stable (see the formal definition in ","element":"span"},{"text":"Section 3)","element":"span"},{"text":".","element":"span"}]]},{"heading":"2 Background","paragraphs":[[{"style":{"fontWeight":"bold"},"text":"2.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Linear Quadratic Control","element":"span"}],[{"text":"The standard linear quadratic (Gaussian) control problem is as follows. Let ","element":"span"},{"style":{"height":15.78},"width":131.42,"height":39.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/1-4.png","element":"img","alt":" xt ∈ Rd ","inline":true,"padRight":true},{"text":"be the system state at time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"and let ","element":"span"},{"style":{"height":15.78},"width":134.95,"height":39.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/1-5.png","element":"img","alt":" ut ∈ Rk","inline":true,"padRight":true},{"text":"be the control (action) taken at time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":". The system transitions to the next state using linear time-invariant dynamics","element":"span"}],[{"id":"id-15","style":{"width":"61%"},"width":1115,"height":62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/1-6.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":9.19},"width":40.53,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-0.png","element":"img","alt":" wt","inline":true,"padRight":true},{"text":"are i.i.d. Gaussian noise vectors with zero mean and covariance ","element":"span"},{"style":{"height":13.2},"width":116.3,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-1.png","element":"img","alt":" W ⪰ 0","inline":true,"padRight":true},{"text":". The cost incurred at each time point is a quadratic function of the state and control, ","element":"span"},{"style":{"height":17.72},"width":269.4,"height":44.31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-2.png","element":"img","alt":" xTt Qxt + uTt Rut","inline":true},{"text":", for positive definite ","element":"span"},{"text":"matrices ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"R","element":"span"},{"text":".","element":"span"}],[{"text":"A policy is a mapping ","element":"span"},{"style":{"height":13.78},"width":216.53,"height":34.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-3.png","element":"img","alt":" π : Rd �→ Rk","inline":true,"padRight":true},{"text":"from the current state ","element":"span"},{"style":{"height":9.19},"width":34.77,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-4.png","element":"img","alt":" xt","inline":true,"padRight":true},{"text":"to a control (i.e., an action) ","element":"span"},{"style":{"height":9.19},"width":34.81,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-5.png","element":"img","alt":" ut","inline":true},{"text":". The cost of a policy after ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"time steps is","element":"span"}],[{"style":{"width":"31%"},"width":567,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-6.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":10},"width":175.06,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-7.png","element":"img","alt":" u1, . . . , uT","inline":true,"padRight":true},{"text":"are chosen according to ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-8.png","element":"img","alt":" π","inline":true},{"text":"; the expectation is w.r.t. the randomness in the state transitions and (possibly) the policy. In the infinite-horizon version of the problem, the goal is to minimize the steady-state cost ","element":"span"},{"style":{"height":16},"width":491.2,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-9.png","element":"img","alt":" J(π) = limT →∞(1/T)JT (π).","inline":true}],[{"text":"In the infinite-horizon setting and when the system is controllable,","element":"span"},{"style":{"height":7.6},"width":16,"height":19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-10.png","element":"img","alt":"3 ","inline":true,"padRight":true},{"text":"it is well-known that the optimal policy is given by constant linear feedback ","element":"span"},{"style":{"height":13.19},"width":166.92,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-11.png","element":"img","alt":" ut = Kxt","inline":true},{"text":". For the optimal ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K","element":"span"},{"text":", the dynamics are given by ","element":"span"},{"style":{"height":15.6},"width":563.57,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-12.png","element":"img","alt":"xt+1 = (A + BK)xt + wt, and K","inline":true,"padRight":true},{"text":"is guaranteed to be stable; a policy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"is called stable if ","element":"span"},{"style":{"height":15.6},"width":278.62,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-13.png","element":"img","alt":" ρ(A + BK) < 1,","inline":true,"padRight":true},{"text":"where for a matrix ","element":"span"},{"style":{"height":16},"width":164.6,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-14.png","element":"img","alt":" M, ρ(M)","inline":true,"padRight":true},{"text":"is the spectral radius of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"text":". In this case, ","element":"span"},{"style":{"height":9.19},"width":34.78,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-15.png","element":"img","alt":" xt","inline":true,"padRight":true},{"text":"converges to a steady-state (stationary) distribution, i.e., ","element":"span"},{"style":{"height":9.19},"width":34.78,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-16.png","element":"img","alt":" xt","inline":true,"padRight":true},{"text":"has the same distribution as ","element":"span"},{"style":{"height":15.6},"width":280.85,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-17.png","element":"img","alt":" (A+BK)xt+wt","inline":true},{"text":". This implies that ","element":"span"},{"style":{"height":15.6},"width":168.11,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-18.png","element":"img","alt":" E[xt] = 0,","inline":true,"padRight":true},{"text":"and the covariance matrix ","element":"span"},{"style":{"height":17.78},"width":981.94,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-19.png","element":"img","alt":" X = E[xtxTt ] satisfies X = (A + BK)X(A + BK)T + W.","inline":true}],[{"text":"The steady-state cost of a stable policy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"with steady-state covariance ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"is given by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"K","element":"span"},{"text":") = ","element":"span"},{"style":{"height":17.79},"width":451.42,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-20.png","element":"img","alt":"(Q + KTRK) • X. Here •","inline":true,"padRight":true},{"text":"denotes element-wise inner product, i.e., ","element":"span"},{"style":{"height":17.79},"width":317.58,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-21.png","element":"img","alt":" A • B = Tr(ATB).","inline":true}],[{"style":{"fontWeight":"bold"},"text":"2.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Problem Setting","element":"span"}],[{"text":"We consider an online setting, where a sequence of positive definite cost matrices ","element":"span"},{"style":{"height":14},"width":408.23,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-22.png","element":"img","alt":" Q1, . . . , QT , R1, . . . , RT","inline":true,"padRight":true},{"text":"is chosen by the environment ahead of time and unknown to the learner. We assume throughout that ","element":"span"},{"style":{"height":16},"width":353.33,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-23.png","element":"img","alt":"Tr(Qt), Tr(Rt) ≤ C","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", for some constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C > ","element":"span"},{"text":"0","element":"span"},{"text":". ","element":"span"},{"text":"We assume that the dynamics ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"A, B","element":"span"},{"text":") ","element":"span"},{"text":"are time-invariant and known, and that the system is initialized at ","element":"span"},{"style":{"height":13.19},"width":113.89,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-24.png","element":"img","alt":" x0 = 0","inline":true},{"text":". At each time step ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", the learner observes the state ","element":"span"},{"style":{"height":9.19},"width":34.78,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-25.png","element":"img","alt":" xt","inline":true},{"text":", chooses an action ","element":"span"},{"style":{"height":9.19},"width":34.81,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-26.png","element":"img","alt":" ut","inline":true},{"text":", and suffers cost ","element":"span"},{"style":{"height":17.72},"width":296.41,"height":44.31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-27.png","element":"img","alt":" xTt Qtxt + uTt Rtut","inline":true},{"text":". Thereafter, the system ","element":"span"},{"text":"transitions to the next state.","element":"span"}],[{"text":"A (randomized) learning algorithm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"is a mapping from ","element":"span"},{"style":{"height":9.19},"width":34.78,"height":22.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-28.png","element":"img","alt":" xt","inline":true,"padRight":true},{"text":"and the previous cost matrices ","element":"span"},{"style":{"height":14},"width":202.46,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-29.png","element":"img","alt":" Q0, ..., Qt−1","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":14},"width":219.9,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-30.png","element":"img","alt":" R0, . . . , Rt−1","inline":true,"padRight":true},{"text":"to a distribution over a control ","element":"span"},{"style":{"height":9.19},"width":34.81,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-31.png","element":"img","alt":" ut","inline":true},{"text":". We define the cost of an algorithm as ","element":"span"},{"style":{"height":16},"width":155.46,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-32.png","element":"img","alt":" JT (A) =","inline":true},{"style":{"height":20.4},"width":763.2,"height":50.99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-33.png","element":"img","alt":"E[�Tt=1 xTt Qtxt + uTt Rtut], where u1, . . . , uT ","inline":true,"padRight":true},{"text":"are chosen at random according to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":".","element":"span"}],[{"text":"The goal of the learner is to minimize the regret, defined as:","element":"span"}],[{"style":{"width":"29%"},"width":526,"height":59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-34.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":10.8},"width":29,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-35.png","element":"img","alt":" Π","inline":true,"padRight":true},{"text":"is a set of benchmark policies. In the sequel, we fix ","element":"span"},{"style":{"height":10.8},"width":29,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-36.png","element":"img","alt":" Π","inline":true,"padRight":true},{"text":"to be the set of all strongly stable policies; we defer the formal definition of this class of policies to ","element":"span"},{"text":"Section 3 ","element":"span"},{"text":"below.","element":"span"}]]},{"heading":"3 Strong Stability","paragraphs":[[{"text":"In this section we formalize the notion of a strongly stable policy and discuss some of its properties. Intuitively, a strongly stable policy is a policy that exhibits fast mixing and converges quickly to a steady-state distribution. Note that, while stable policies ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"(for which ","element":"span"},{"style":{"height":16},"width":284.75,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-37.png","element":"img","alt":" ρ(A + BK) < 1","inline":true},{"text":") necessarily converge to a steady-state, nothing is guaranteed regarding their rate of convergence. The following definition helps remedy that.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 3.1 ","element":"span"},{"text":"(Strong Stability)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"A policy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"is ","element":"span"},{"style":{"height":16},"width":362.61,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-38.png","element":"img","alt":" (κ, γ)-strongly stable","inline":true,"padRight":true},{"text":"(for ","element":"span"},{"style":{"height":11.6},"width":103.38,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-39.png","element":"img","alt":" κ > 0","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":14.4},"width":184.01,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-40.png","element":"img","alt":" 0 < γ ≤ 1","inline":true},{"text":") if ","element":"span"},{"style":{"height":16},"width":161.69,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-41.png","element":"img","alt":"∥K∥ ≤ κ","inline":true},{"text":", and there exists matrices ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"such that ","element":"span"},{"style":{"height":14.58},"width":355.26,"height":36.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-42.png","element":"img","alt":" A + BK = HLH−1","inline":true},{"text":", with ","element":"span"},{"style":{"height":16},"width":223.76,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-43.png","element":"img","alt":" ∥L∥ ≤ 1 − γ","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":17.39},"width":282.31,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-44.png","element":"img","alt":"∥H∥∥H−1∥ ≤ κ.","inline":true}],[{"text":"Strong-stability is a quantitative version of stability, in the sense that any stable policy is strongly-stable for some ","element":"span"},{"style":{"height":7.2},"width":23,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-45.png","element":"img","alt":" κ","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-46.png","element":"img","alt":" γ","inline":true,"padRight":true},{"text":"(See ","element":"span"},{"href":"#id-29","text":"Lemma B.1 ","element":"a"},{"text":"in the supplementary material). Conversely, strong-stability implies stability: if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"is strongly-stable then ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"BK ","element":"span"},{"text":"is similar to a matrix ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L ","element":"span"},{"text":"with ","element":"span"},{"style":{"height":16},"width":153.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/2-47.png","element":"img","alt":" ∥L∥ < 1","inline":true},{"text":", and so ","element":"span"},{"style":{"height":16},"width":569.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-0.png","element":"img","alt":" ρ(A + BK) = ρ(L) ≤ ∥L∥ < 1","inline":true},{"text":", i.e., ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"is stable. Notice that for a strongly stable ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K","element":"span"},{"text":", although ","element":"span"},{"style":{"height":15.6},"width":267.99,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-1.png","element":"img","alt":"ρ(A + BK) < 1","inline":true},{"text":", it may not be the case that ","element":"span"},{"style":{"height":16},"width":256.86,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-2.png","element":"img","alt":" ∥A + BK∥ < 1","inline":true},{"text":", and a non-trivial transformation ","element":"span"},{"style":{"height":15.2},"width":194.01,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-3.png","element":"img","alt":" H ̸= I may","inline":true,"padRight":true},{"text":"be required to make the norm smaller than one (this is indeed the case with feasible solutions to our SDP relaxation).","element":"span"}],[{"text":"Strong stability ensures exponentially fast convergence to steady-state, as is made precise in the next lemma.","element":"span"}],[{"style":{"width":"0%"},"width":9,"height":3,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-4.png","element":"img"}],[{"id":"id-59","style":{"fontWeight":"bold"},"text":"Lemma 3.2. ","element":"span"},{"text":"For all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"2","element":"span"},{"style":{"fontStyle":"italic"},"text":", . . . ","element":"span"},{"text":"let ","element":"span"},{"style":{"height":13.19},"width":48.14,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-5.png","element":"img","alt":"�Xt","inline":true,"padRight":true},{"text":"be the state covariance matrix on round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"starting from some ","element":"span"},{"style":{"height":13.2},"width":130,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-6.png","element":"img","alt":"�X0 ⪰ 0","inline":true,"padRight":true},{"text":"and following a (","element":"span"},{"style":{"height":10.8},"width":62.69,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-7.png","element":"img","alt":"κ, γ","inline":true},{"text":")-strongly stable policy ","element":"span"},{"style":{"height":16},"width":194.84,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-8.png","element":"img","alt":" π(x) = Kx","inline":true},{"text":". Then ","element":"span"},{"style":{"height":14},"width":189.91,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-9.png","element":"img","alt":"�X1, �X2, . . .","inline":true,"padRight":true},{"text":"approaches a steady-state covariance matrix ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":", and further, for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"it holds that","element":"span"}],[{"style":{"width":"30%"},"width":543,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-10.png","element":"img"}],[{"text":"This exponential convergence is true even if the policy is randomized and follows ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"in expectation; that is, if ","element":"span"},{"style":{"height":16},"width":273.3,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-11.png","element":"img","alt":" E[π(x)|x] = Kx","inline":true},{"text":", and provided that ","element":"span"},{"style":{"height":16},"width":355.93,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-12.png","element":"img","alt":" Cov[π(x)|x] is finite.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Let us first analyze deterministic policies. As noted above, we know that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"is stable and as a result the state covariances ","element":"span"},{"style":{"height":13.19},"width":48.14,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-13.png","element":"img","alt":"�Xt","inline":true,"padRight":true},{"text":"approach a steady-state covariance ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":". By definition, we have","element":"span"}],[{"style":{"width":"48%"},"width":873,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-14.png","element":"img"}],[{"text":"Subtracting the equations and recursing, we have ","element":"span"},{"style":{"height":17.79},"width":807.17,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-15.png","element":"img","alt":"�Xt − X = (A + BK)t( �X0 − X)((A + BK)t)T,","inline":true,"padRight":true},{"text":"which gives","element":"span"}],[{"style":{"width":"36%"},"width":661,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-16.png","element":"img"}],[{"text":"For further bounding the right-hand side, observe that ","element":"span"},{"style":{"height":17.38},"width":501.31,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-17.png","element":"img","alt":" (A + BK)t = HLtH−1, thus","inline":true}],[{"style":{"width":"51%"},"width":929,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-18.png","element":"img"}],[{"text":"Combining the inequalities gives the result for deterministic policies. For randomized policies with ","element":"span"},{"text":"E","element":"span"},{"text":"[","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":"] = ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Kx ","element":"span"},{"text":"and finite ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"text":"= Cov[","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":"]","element":"span"},{"text":", the dynamics of the state covariance take the form","element":"span"}],[{"style":{"width":"55%"},"width":997,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-19.png","element":"img"}],[{"text":"Since the analysis above only depends on the difference between the equations, the added ","element":"span"},{"style":{"height":13.78},"width":113.83,"height":34.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-20.png","element":"img","alt":" BV BT","inline":true,"padRight":true},{"text":"term has no effect on the convergence of ","element":"span"},{"style":{"height":13.19},"width":45.02,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-21.png","element":"img","alt":" Xt","inline":true},{"text":". Note, however, that the steady state ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"itself will be a function of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"text":"in general.","element":"span"}],[{"text":"Let us state one more property of strongly stable policies that will be useful in our analysis.","element":"span"}],[{"id":"id-47","style":{"fontWeight":"bold"},"text":"Lemma 3.3. ","element":"span"},{"text":"Assume that ","element":"span"},{"style":{"height":15.6},"width":182.65,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-22.png","element":"img","alt":" K is (κ, γ)","inline":true},{"text":"-strongly stable, and let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":"be the covariances of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"text":"at steady-state when following ","element":"span"},{"style":{"height":17.39},"width":1062.3,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-23.png","element":"img","alt":" K. Then Tr(X) ≤ (κ2/γ) Tr(W) and Tr(U) ≤ (κ4/γ) Tr(W).","inline":true}],[{"style":{"fontWeight":"bold"},"text":"3.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Sequential strong stability","element":"span"}],[{"text":"We next present a stronger notion of strong stability which plays a central role in our analysis. Roughly speaking, the goal is to argue about fast mixing when following a sequence of different policies ","element":"span"},{"style":{"height":14},"width":185.32,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-24.png","element":"img","alt":" K1, K2, . . .","inline":true,"padRight":true},{"text":"(rather than a fixed policy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"throughout). In this case, for any kind of mixing to take place, not only does one has to require that each policy is strongly stable, but also that the sequence is “slowly changing.” This motivates the following definition.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 3.4 ","element":"span"},{"text":"(sequential strong stability)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"A sequence of policies ","element":"span"},{"style":{"height":16},"width":663.1,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-25.png","element":"img","alt":" K1, . . . , KT is (κ, γ)-strongly stable (for","inline":true},{"style":{"height":14.8},"width":353.82,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-26.png","element":"img","alt":"κ > 0 and 0 < γ ≤ 1","inline":true},{"text":") if there exist matrices ","element":"span"},{"style":{"height":18.34},"width":1035.73,"height":45.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-27.png","element":"img","alt":" H1, . . . , HT and L1, . . . , LT such that A + BKt = HtLtH−1t","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", with the following properties:","element":"span"}],[{"style":{"width":"30%"},"width":548,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/3-28.png","element":"img"}],[{"style":{"width":"46%"},"width":833,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-0.png","element":"img"}],[{"text":"Strongly stable sequences mix quickly, in the following sense (proof is deferred to ","element":"span"},{"text":"Appendix A)","element":"span"},{"text":".","element":"span"}],[{"id":"id-52","style":{"fontWeight":"bold"},"text":"Lemma 3.5. ","element":"span"},{"text":"Let ","element":"span"},{"style":{"height":16.4},"width":465.98,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-1.png","element":"img","alt":" πt(x) = Ktx (t = 1, 2, . . .","inline":true},{"text":") be a sequence of policies with respective steady-state covariance matrices ","element":"span"},{"style":{"height":14},"width":183.66,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-2.png","element":"img","alt":" X1, X2, . . .","inline":true},{"text":", such that ","element":"span"},{"style":{"height":15.6},"width":351.78,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-3.png","element":"img","alt":" K1, K2, . . . is a (κ, γ)","inline":true},{"text":"-strongly stable sequence and ","element":"span"},{"style":{"height":16},"width":248.31,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-4.png","element":"img","alt":" ∥Xt−Xt−1∥ ≤","inline":true},{"style":{"height":14.8},"width":589.65,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-5.png","element":"img","alt":"η for all t, for some η > 0. Let �Xt","inline":true,"padRight":true},{"text":"be the state covariance matrix on round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"starting from some ","element":"span"},{"style":{"height":13.2},"width":127.15,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-6.png","element":"img","alt":"�X1 ⪰ 0","inline":true,"padRight":true},{"text":"and following this sequence. Then","element":"span"}],[{"style":{"width":"42%"},"width":773,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-7.png","element":"img"}],[{"text":"The same is true even if the policies are randomized, such that ","element":"span"},{"style":{"height":16},"width":709.54,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-8.png","element":"img","alt":" E[πt(x)|x] = Ktx and Cov[πt(x)|x] exists","inline":true,"padRight":true},{"text":"and is finite.","element":"span"}]]},{"heading":"4 SDP Relaxation for LQ control","paragraphs":[[{"text":"We now present our SDP relaxation for the infinite-horizon LQ control problem. Our presentation requires the following definitions. Consider an LQ control problem parameterized by matrices ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A, B, Q, R ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W","element":"span"},{"text":". For any stable policy (for which a steady-state distribution exists), define","element":"span"}],[{"id":"id-31","style":{"width":"61%"},"width":1104,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-9.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"is distributed according to the steady-state distribution of ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-10.png","element":"img","alt":" π","inline":true},{"text":", and ","element":"span"},{"style":{"height":16},"width":154.91,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-11.png","element":"img","alt":" u = π(x)","inline":true},{"text":". Then, the infinite horizon cost of ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-12.png","element":"img","alt":" π","inline":true,"padRight":true},{"text":"is given by ","element":"span"},{"style":{"height":19.79},"width":356.35,"height":49.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-13.png","element":"img","alt":" J(π) = ( Q 00 R ) • E(π)","inline":true},{"text":". For a policy ","element":"span"},{"style":{"height":16},"width":219.92,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-14.png","element":"img","alt":" πK(x) = Kx","inline":true,"padRight":true},{"text":"defined by a stable control ","element":"span"},{"text":"matrix ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"(i.e., for which ","element":"span"},{"style":{"height":16},"width":272.25,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-15.png","element":"img","alt":" ρ(A + BK) < 1","inline":true},{"text":"), this matrix takes the form","element":"span"}],[{"style":{"width":"62%"},"width":1129,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-16.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"is the state covariance at steady-state. (We slightly abuse notation and write ","element":"span"},{"style":{"fontStyle":"italic"},"text":"E","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"K","element":"span"},{"text":") ","element":"span"},{"text":"instead of ","element":"span"},{"style":{"height":16},"width":109.53,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-17.png","element":"img","alt":"E(πK)","inline":true},{"text":"). In this case, one also has ","element":"span"},{"style":{"height":17.78},"width":666.37,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-18.png","element":"img","alt":" J(K) = J(E(K)) = (Q + KTRK) • X.","inline":true}],[{"style":{"fontWeight":"bold"},"text":"4.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"The relaxation","element":"span"}],[{"text":"We can now present our SDP relaxation for the LQ control problem given by ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"A, B, Q, R, W","element":"span"},{"text":")","element":"span"},{"text":", which takes the form:","element":"span"}],[{"id":"id-30","style":{"width":"79%"},"width":1442,"height":225,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-19.png","element":"img"}],[{"text":"Here, ","element":"span"},{"style":{"height":11.6},"width":95.36,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-20.png","element":"img","alt":" ν > 0","inline":true,"padRight":true},{"text":"is a parameter whose value will be determined later, and ","element":"span"},{"style":{"height":16},"width":599.14,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-21.png","element":"img","alt":" Σ is a (d + k) × (d + k) symmetric","inline":true,"padRight":true},{"text":"matrix that decomposes to blocks as follows:","element":"span"}],[{"style":{"width":"59%"},"width":1080,"height":209,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-22.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Lemma 4.1. ","element":"span"},{"text":"For any stable policy ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-23.png","element":"img","alt":" π","inline":true,"padRight":true},{"text":"such that at steady-state ","element":"span"},{"style":{"height":17.39},"width":334.08,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-24.png","element":"img","alt":" E∥x∥2 + E∥u∥2 ≤ ν","inline":true},{"text":", the matrix ","element":"span"},{"style":{"height":15.6},"width":159.65,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-25.png","element":"img","alt":" Σ = E(π)","inline":true,"padRight":true},{"text":"is feasible for ","element":"span"},{"href":"#id-30","text":"(3)","element":"a"},{"text":".","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Let ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-26.png","element":"img","alt":" π","inline":true,"padRight":true},{"text":"be any stable policy and consider the matrix ","element":"span"},{"style":{"height":16},"width":179,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-27.png","element":"img","alt":" Σ = E(π)","inline":true},{"text":". Then ","element":"span"},{"style":{"height":13.2},"width":117.83,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-28.png","element":"img","alt":" Σ ⪰ 0","inline":true,"padRight":true},{"text":"(by definition, recall ","element":"span"},{"href":"#id-31","text":"Eq. (1))","element":"a"},{"text":", and satisfies the equality constraint of ","element":"span"},{"href":"#id-30","text":"(3)","element":"a"},{"text":", since if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"is at steady-state and ","element":"span"},{"style":{"height":16},"width":160.76,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-29.png","element":"img","alt":" u = π(x)","inline":true},{"text":", then ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Ax ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Bu ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"w ","element":"span"},{"text":"has the same distribution as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"for ","element":"span"},{"style":{"height":16.4},"width":248.39,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-30.png","element":"img","alt":" w ∼ N(0, W)","inline":true,"padRight":true},{"text":"independent of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"text":", thus ","element":"span"},{"style":{"height":17.79},"width":788.34,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-31.png","element":"img","alt":"E[xxT] = E[(Ax + Bu + w)(Ax + Bu + w)T];","inline":true,"padRight":true},{"text":"the latter is equivalent to ","element":"span"},{"style":{"height":17.79},"width":531.59,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-32.png","element":"img","alt":" Σxx = (A B)Σ(A B)T + W","inline":true},{"text":". Finally, observe that ","element":"span"},{"style":{"height":17.78},"width":1015.55,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-33.png","element":"img","alt":" Tr(Σ) = E Tr(xxT)+E Tr(uuT) = E∥x∥2+E∥u∥2 where x, u","inline":true,"padRight":true},{"text":"are distributed according to the steady-state distribution of ","element":"span"},{"style":{"height":14.4},"width":188.16,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/4-34.png","element":"img","alt":" π, hence Σ","inline":true,"padRight":true},{"text":"satisfies the trace constraint.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"4.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Extracting a policy","element":"span"}],[{"text":"We next show that from any feasible solution to the SDP, one can extract a stable policy with the same (if not better) cost, provided that ","element":"span"},{"style":{"height":11.6},"width":116.3,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-0.png","element":"img","alt":" W ≻ 0","inline":true},{"text":". For any feasible solution ","element":"span"},{"style":{"height":10.8},"width":28,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-1.png","element":"img","alt":" Σ","inline":true,"padRight":true},{"text":"for the SDP, define a control matrix as follows:","element":"span"}],[{"style":{"width":"57%"},"width":1045,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-2.png","element":"img"}],[{"text":"Note that, due to the equality constraint of the SDP, our assumption ","element":"span"},{"style":{"height":11.6},"width":116.36,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-3.png","element":"img","alt":" W ≻ 0","inline":true,"padRight":true},{"text":"ensures that ","element":"span"},{"style":{"height":14.4},"width":238.11,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-4.png","element":"img","alt":" Σxx ≻ 0, thus","inline":true},{"style":{"height":13.19},"width":64.86,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-5.png","element":"img","alt":"Σxx","inline":true,"padRight":true},{"text":"is nonsingular and ","element":"span"},{"style":{"height":16},"width":91.22,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-6.png","element":"img","alt":" K(Σ)","inline":true,"padRight":true},{"text":"is well defined.","element":"span"}],[{"id":"id-32","style":{"height":11.6},"width":377.8,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-7.png","element":"img","alt":"Theorem 4.2. Let Σ","inline":true,"padRight":true},{"text":"be any feasible solution to the SDP, and let ","element":"span"},{"style":{"height":15.6},"width":178.54,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-8.png","element":"img","alt":" K = K(Σ)","inline":true},{"text":". Then the policy ","element":"span"},{"style":{"height":15.6},"width":189.52,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-9.png","element":"img","alt":" π(x) = Kx","inline":true,"padRight":true},{"text":"is stable, and it holds that ","element":"span"},{"style":{"height":16},"width":174.58,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-10.png","element":"img","alt":" E(K) ⪯ Σ","inline":true},{"text":". In particular, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"E","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"K","element":"span"},{"text":") ","element":"span"},{"text":"is also feasible for the SDP and its cost is at most that of ","element":"span"},{"style":{"height":10.8},"width":39.78,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-11.png","element":"img","alt":" Σ.","inline":true}],[{"text":"Without the trace constraint, the theorem particularly implies that for the optimal solution ","element":"span"},{"style":{"height":10.8},"width":43.36,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-12.png","element":"img","alt":" Σ⋆","inline":true,"padRight":true},{"text":"of the SDP, the corresponding control matrix ","element":"span"},{"style":{"height":16},"width":220.74,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-13.png","element":"img","alt":" K⋆ = K(Σ⋆)","inline":true,"padRight":true},{"text":"is an optimal policy for the original problem, recovering a classic result in control theory.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-32","style":{"fontStyle":"italic"},"text":"Theorem 4.2. ","element":"a"},{"text":"Our first step is to show that","element":"span"}],[{"id":"id-33","style":{"width":"64%"},"width":1174,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-14.png","element":"img"}],[{"text":"To see this, observe that by definition of ","element":"span"},{"style":{"height":16},"width":331.33,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-15.png","element":"img","alt":" K = K(Σ) we have","inline":true}],[{"style":{"width":"35%"},"width":640,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-16.png","element":"img"}],[{"text":"Thus, it suffices to show that ","element":"span"},{"style":{"height":17.72},"width":317.96,"height":44.31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-17.png","element":"img","alt":" Σuu − ΣTuxΣ−1xx Σux ","inline":true,"padRight":true},{"text":"is PSD. The latter matrix is the Schur complement of ","element":"span"},{"style":{"height":13.6},"width":39.21,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-18.png","element":"img","alt":" Σ,","inline":true,"padRight":true},{"text":"and is PSD because ","element":"span"},{"style":{"height":11.6},"width":172.77,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-19.png","element":"img","alt":" Σ is PSD.","inline":true}],[{"text":"Next, we show that the control matrix ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"gives rise to a stable policy. Let us develop ","element":"span"},{"href":"#id-30","text":"Eq. (3). ","element":"a"},{"text":"First, since ","element":"span"},{"style":{"height":11.6},"width":116.3,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-20.png","element":"img","alt":" W ≻ 0","inline":true,"padRight":true},{"text":"we also have that ","element":"span"},{"style":{"height":13.19},"width":140.05,"height":32.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-21.png","element":"img","alt":" Σxx ≻ 0","inline":true},{"text":". Moreover, by ","element":"span"},{"href":"#id-33","text":"Eq. (5),","element":"a"}],[{"style":{"width":"36%"},"width":652,"height":179,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-22.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":11.6},"width":133,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-23.png","element":"img","alt":" λ and v","inline":true,"padRight":true},{"text":"be a (possibly complex) eigenvalue and left-eigenvector associated with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"BK","element":"span"},{"text":". Then,","element":"span"}],[{"style":{"width":"52%"},"width":948,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-24.png","element":"img"}],[{"text":"which, by ","element":"span"},{"style":{"height":16},"width":474.66,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-25.png","element":"img","alt":" v∗Σxxv > 0, implies |λ| < 1","inline":true},{"text":". This is true for all eigenvalues ","element":"span"},{"style":{"height":10.8},"width":23,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-26.png","element":"img","alt":" λ","inline":true},{"text":", and shows that ","element":"span"},{"style":{"height":15.6},"width":281.73,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-27.png","element":"img","alt":" ρ(A + BK) < 1,","inline":true,"padRight":true},{"text":"that is, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"is stable.","element":"span"}],[{"text":"Finally, let us show that ","element":"span"},{"style":{"height":16},"width":189.02,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-28.png","element":"img","alt":" E(K) ⪯ Σ′","inline":true},{"text":", which together with ","element":"span"},{"href":"#id-33","text":"Eq. (5) ","element":"a"},{"text":"would imply our claim ","element":"span"},{"style":{"height":16},"width":174.86,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-29.png","element":"img","alt":" E(K) ⪯ Σ","inline":true},{"text":". Denote by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"the state covariance at steady-state when following ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K","element":"span"},{"text":"; then,","element":"span"}],[{"style":{"width":"25%"},"width":460,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-30.png","element":"img"}],[{"text":"To establish that ","element":"span"},{"style":{"height":16},"width":188.2,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-31.png","element":"img","alt":" E(K) ⪯ Σ′ ","inline":true,"padRight":true},{"text":"it is enough to show ","element":"span"},{"style":{"height":13.2},"width":154.14,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-32.png","element":"img","alt":" X ⪯ Σxx","inline":true},{"text":". To this end, let ","element":"span"},{"style":{"height":13.99},"width":417.79,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-33.png","element":"img","alt":" ∆ = Σxx − X and write","inline":true}],[{"style":{"width":"38%"},"width":687,"height":179,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-34.png","element":"img"}],[{"text":"from which we get ","element":"span"},{"style":{"height":17.78},"width":493.85,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-35.png","element":"img","alt":" ∆ ⪰ (A + BK)∆(A + BK)T","inline":true},{"text":". Applying the latter inequality recursively, we obtain","element":"span"}],[{"style":{"width":"32%"},"width":591,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/5-36.png","element":"img"}],[{"text":"Recall that ","element":"span"},{"style":{"height":16},"width":273.3,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-0.png","element":"img","alt":" ρ(A + BK) < 1","inline":true},{"text":"; thus, taking the limit as ","element":"span"},{"style":{"height":8.8},"width":125.94,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-1.png","element":"img","alt":" n → ∞","inline":true},{"text":", we get ","element":"span"},{"style":{"height":17.78},"width":580.43,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-2.png","element":"img","alt":" (A + BK)n∆((A + BK)T)n → 0,","inline":true,"padRight":true},{"text":"which implies ","element":"span"},{"style":{"height":14},"width":106.35,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-3.png","element":"img","alt":" ∆ ⪰ 0","inline":true},{"text":". This shows that ","element":"span"},{"style":{"height":13.2},"width":154.14,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-4.png","element":"img","alt":" X ⪯ Σxx","inline":true},{"text":", as required.","element":"span"}],[{"text":"To complete the proof observe that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"E","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"K","element":"span"},{"text":") ","element":"span"},{"text":"is feasible for the SDP since ","element":"span"},{"style":{"height":16},"width":176.1,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-5.png","element":"img","alt":" E(K) ⪯ Σ","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":10.8},"width":29,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-6.png","element":"img","alt":" Σ","inline":true,"padRight":true},{"text":"is feasible. Furthermore, since ","element":"span"},{"style":{"height":19.79},"width":105.09,"height":49.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-7.png","element":"img","alt":" ( Q 00 R )","inline":true,"padRight":true},{"text":"is PSD, we have","element":"span"}],[{"style":{"width":"75%"},"width":1361,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-8.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"4.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Strong stability of solutions","element":"span"}],[{"text":"Let us show that from a solution to the SDP one can extract a strongly stable policy.","element":"span"}],[{"id":"id-38","style":{"fontWeight":"bold"},"text":"Lemma 4.3. ","element":"span"},{"text":"Assume that ","element":"span"},{"style":{"height":15.78},"width":158.38,"height":39.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-9.png","element":"img","alt":" W ⪰ σ2I","inline":true,"padRight":true},{"text":"and let ","element":"span"},{"style":{"height":16.28},"width":175.08,"height":40.71,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-10.png","element":"img","alt":" κ = √ν/σ","inline":true},{"text":". Then for any feasible solution ","element":"span"},{"style":{"height":10.8},"width":29,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-11.png","element":"img","alt":" Σ","inline":true,"padRight":true},{"text":"for the SDP, the policy ","element":"span"},{"style":{"height":17.38},"width":406.67,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-12.png","element":"img","alt":" K = K(Σ) is (κ, 1/2κ2)","inline":true},{"text":"-strongly stable.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"According to ","element":"span"},{"href":"#id-32","text":"Theorem 4.2, ","element":"a"},{"text":"the policy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"is (weakly) stable and the matrix ","element":"span"},{"style":{"height":16},"width":175.31,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-13.png","element":"img","alt":"�Σ = E(K)","inline":true,"padRight":true},{"text":"is feasible for the SDP. Let ","element":"span"},{"style":{"height":13.19},"width":156.43,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-14.png","element":"img","alt":" X = �Σxx","inline":true,"padRight":true},{"text":"be the state covariance of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"at steady-state. Since ","element":"span"},{"style":{"height":10.8},"width":29,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-15.png","element":"img","alt":"�Σ","inline":true,"padRight":true},{"text":"is feasible, and since","element":"span"}],[{"id":"id-34","style":{"width":"99%"},"width":1795,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-16.png","element":"img"}],[{"text":"In particular, this means that ","element":"span"},{"style":{"height":15.78},"width":151.36,"height":39.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-17.png","element":"img","alt":" X ⪰ σ2I","inline":true},{"text":". On the other hand, we have ","element":"span"},{"style":{"height":16},"width":336.81,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-18.png","element":"img","alt":" Tr(X) ≤ Tr(�Σ) ≤ ν","inline":true},{"text":", thus ","element":"span"},{"style":{"height":13.2},"width":131.52,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-19.png","element":"img","alt":" X ⪯ νI","inline":true},{"text":". Overall,","element":"span"}],[{"id":"id-35","style":{"width":"56%"},"width":1027,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-20.png","element":"img"}],[{"text":"Given that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"is nonsingular, we can define ","element":"span"},{"style":{"height":18.19},"width":444.28,"height":45.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-21.png","element":"img","alt":" L = X−1/2(A+BK)X1/2","inline":true},{"text":". Multiplying ","element":"span"},{"href":"#id-34","text":"Eq. (6) ","element":"a"},{"text":"by ","element":"span"},{"style":{"height":14.59},"width":201.17,"height":36.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-22.png","element":"img","alt":" X−1/2 from","inline":true,"padRight":true},{"text":"both sides, we obtain ","element":"span"},{"style":{"height":16.19},"width":590.41,"height":40.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-23.png","element":"img","alt":" I ⪰ LLT + σ2X−1 ⪰ LLT + κ−2I.","inline":true,"padRight":true},{"text":"Thus ","element":"span"},{"style":{"height":17.79},"width":312.39,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-24.png","element":"img","alt":" LLT ⪯ (1 − κ−2)I","inline":true},{"text":", so ","element":"span"},{"style":{"height":18.08},"width":329.79,"height":45.21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-25.png","element":"img","alt":" ∥L∥ ≤√1 − κ−2 ≤","inline":true},{"style":{"height":17.38},"width":174.46,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-26.png","element":"img","alt":"1 − κ−2/2","inline":true},{"text":". Also, ","element":"span"},{"href":"#id-35","text":"Eq. (7) ","element":"a"},{"text":"shows that ","element":"span"},{"style":{"height":18.18},"width":353.21,"height":45.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-27.png","element":"img","alt":" ∥X1/2∥∥X−1/2∥ ≤ κ","inline":true},{"text":". It is left to establish the bound on the norm ","element":"span"},{"style":{"height":16},"width":93.55,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-28.png","element":"img","alt":"∥K∥F","inline":true},{"text":". To this end, use the fact that","element":"span"}],[{"style":{"width":"38%"},"width":689,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-29.png","element":"img"}],[{"text":"together with ","element":"span"},{"style":{"height":15.78},"width":151.36,"height":39.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-30.png","element":"img","alt":" X ⪰ σ2I","inline":true,"padRight":true},{"text":"(recall ","element":"span"},{"href":"#id-35","text":"Eq. (7)) ","element":"a"},{"text":"to obtain ","element":"span"},{"style":{"height":17.9},"width":557.3,"height":44.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-31.png","element":"img","alt":" σ2∥K∥2F ≤ ν, that is, ∥K∥F ≤ κ.","inline":true}],[{"text":"We can also prove an analogous statement for sequences of feasible solutions, provided that they change slowly enough (we defer the proof to ","element":"span"},{"text":"Appendix A)","element":"span"},{"text":".","element":"span"}],[{"id":"id-54","style":{"fontWeight":"bold"},"text":"Lemma 4.4. ","element":"span"},{"text":"Assume that ","element":"span"},{"style":{"height":17.38},"width":745.15,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-32.png","element":"img","alt":" W ⪰ σ2I and let κ = √ν/σ. Let Σ1, Σ2, . . .","inline":true,"padRight":true},{"text":"be a sequence of feasible solutions of ","element":"span"},{"href":"#id-30","text":"(3)","element":"a"},{"text":", and suppose that ","element":"span"},{"style":{"height":16},"width":290.26,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-33.png","element":"img","alt":" ∥Σt+1 − Σt∥ ≤ η","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"for some ","element":"span"},{"style":{"height":17.39},"width":176.34,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-34.png","element":"img","alt":" η ≤ σ2/κ2","inline":true},{"text":". Then the sequence ","element":"span"},{"style":{"height":14},"width":185.32,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-35.png","element":"img","alt":" K1, K2, . . .","inline":true},{"text":", where ","element":"span"},{"style":{"height":17.39},"width":575.86,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-36.png","element":"img","alt":" Kt = K(Σt) for all t is (κ, 1/2κ2)","inline":true},{"text":"-strongly stable.","element":"span"}]]},{"heading":"5 Online LQ Control","paragraphs":[[{"text":"In this section we describe our gradient based algorithm for online LQ control, presented in ","element":"span"},{"href":"#id-36","text":"Algorithm 1. ","element":"a"},{"text":"The algorithm maintains an “ideal” steady-state covariance matrix ","element":"span"},{"style":{"height":13.19},"width":41.35,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-37.png","element":"img","alt":" Σt","inline":true,"padRight":true},{"text":"by performing online gradients steps directly on the SDP we formulated in ","element":"span"},{"text":"Section 4 ","element":"span"},{"text":"(with the linear cost functions changing from round to round). Then, a control matrix ","element":"span"},{"style":{"height":13.19},"width":45.84,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-38.png","element":"img","alt":" Kt","inline":true,"padRight":true},{"text":"is extracted from the covariance ","element":"span"},{"style":{"height":13.19},"width":41.36,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-39.png","element":"img","alt":" Σt","inline":true,"padRight":true},{"text":"and is used to generate a prediction.","element":"span"}],[{"text":"Notice that the predictions made by the algorithm are randomly drawn from the Gaussian ","element":"span"},{"style":{"height":16.4},"width":219.61,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-40.png","element":"img","alt":" N(Ktxt, Vt),","inline":true,"padRight":true},{"text":"and only follow the extracted policies ","element":"span"},{"style":{"height":14},"width":172.05,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/6-41.png","element":"img","alt":" K1, K2, ...","inline":true,"padRight":true},{"text":"in expectation. This randomization step is crucial for the algorithm to exhibit fast mixing: sampling the prediction from a distribution with the right covariance ensures the observed covariance matrices converge to those generated by the algorithm, and consequently this sequence “mixes” more quickly.","element":"span"}],[{"id":"id-37","text":"For ","element":"span"},{"href":"#id-36","text":"Algorithm 1 ","element":"a"},{"text":"we prove the following guarantee.","element":"span"}],[{"id":"id-36","style":{"width":"99%"},"width":1806,"height":758,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-0.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Theorem 5.1. ","element":"span"},{"text":"Assume that ","element":"span"},{"style":{"height":17.39},"width":209.05,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-1.png","element":"img","alt":" Tr(W) ≤ λ2","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":15.79},"width":159.19,"height":39.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-2.png","element":"img","alt":" W ⪰ σ2I","inline":true},{"text":". Given ","element":"span"},{"style":{"height":11.6},"width":96.91,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-3.png","element":"img","alt":" κ > 0","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":14.4},"width":171.06,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-4.png","element":"img","alt":" 0 ≤ γ < 1","inline":true},{"text":", set ","element":"span"},{"style":{"height":17.39},"width":221,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-5.png","element":"img","alt":" ν = 2κ4λ2/γ","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":18.3},"width":301.04,"height":45.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-6.png","element":"img","alt":" η = σ3/(2C√νT)","inline":true},{"text":". The expected regret of ","element":"span"},{"href":"#id-36","text":"Algorithm 1 ","element":"a"},{"text":"compared to any ","element":"span"},{"style":{"height":15.6},"width":93.7,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-7.png","element":"img","alt":" (κ, γ)","inline":true},{"text":"-strongly stable control matrix ","element":"span"},{"style":{"height":11.6},"width":241.91,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-8.png","element":"img","alt":" K⋆ is at most","inline":true}],[{"style":{"width":"36%"},"width":657,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-9.png","element":"img"}],[{"text":"provided that ","element":"span"},{"style":{"height":17.39},"width":310.68,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-10.png","element":"img","alt":" T ≥ 8κ4λ2/(γσ2).","inline":true}],[{"text":"We remark that the theorem (in fact, ","element":"span"},{"href":"#id-36","text":"Algorithm 1 ","element":"a"},{"text":"itself) tacitly assumes that the SDP defined by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"is feasible; otherwise, the set of strongly-stable policies is empty and the statement of ","element":"span"},{"href":"#id-37","text":"Theorem 5.1 ","element":"a"},{"text":"is vacuous.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Fix an arbitrary ","element":"span"},{"style":{"height":16},"width":95.32,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-11.png","element":"img","alt":" (κ, γ)","inline":true},{"text":"-strongly stable control matrix ","element":"span"},{"style":{"height":10.8},"width":50.7,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-12.png","element":"img","alt":" K⋆","inline":true},{"text":", and denote by ","element":"span"},{"style":{"height":15.19},"width":187.44,"height":37.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-13.png","element":"img","alt":"�Σ⋆1, . . . , �Σ⋆T","inline":true,"padRight":true},{"text":"be the ","element":"span"},{"text":"covariances induced by using ","element":"span"},{"style":{"height":10.8},"width":50.7,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-14.png","element":"img","alt":" K⋆","inline":true,"padRight":true},{"text":"throughout. Also, let ","element":"span"},{"style":{"height":14},"width":187,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-15.png","element":"img","alt":"�Σ1, . . . , �ΣT","inline":true,"padRight":true},{"text":"be the actual observed covariance matrices induced by the algorithm. Denoting ","element":"span"},{"style":{"height":21.38},"width":241.58,"height":53.45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-16.png","element":"img","alt":" Lt =� Qt 00 Rt�,","inline":true,"padRight":true},{"text":"the expected regret of the algorithm can be then written as follows:","element":"span"}],[{"id":"id-39","style":{"width":"68%"},"width":1241,"height":395,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-17.png","element":"img"}],[{"text":"Observe that the sequence ","element":"span"},{"style":{"height":14},"width":188.15,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-18.png","element":"img","alt":" Σ1, . . . , ΣT","inline":true,"padRight":true},{"text":"generated by the algorithm is feasible for the (feasibility) SDP described by the set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":". Thanks to ","element":"span"},{"href":"#id-38","text":"Lemma 4.3, ","element":"a"},{"text":"for any feasible ","element":"span"},{"style":{"height":11.6},"width":103.68,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-19.png","element":"img","alt":" Σ ∈ S","inline":true,"padRight":true},{"text":"the corresponding control matrix ","element":"span"},{"style":{"height":16},"width":92.1,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-20.png","element":"img","alt":"K(Σ)","inline":true,"padRight":true},{"text":"is ","element":"span"},{"style":{"height":16},"width":95.33,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-21.png","element":"img","alt":" (¯κ, ¯γ)","inline":true},{"text":"-strongly stable, for ","element":"span"},{"style":{"height":16.28},"width":178.88,"height":40.7,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-22.png","element":"img","alt":" ¯κ = √ν/σ","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":17.39},"width":183.72,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-23.png","element":"img","alt":" ¯γ = σ2/2ν","inline":true},{"text":"; in particular, this applies to each of the matrices ","element":"span"},{"style":{"height":13.19},"width":53.81,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-24.png","element":"img","alt":" Σt.","inline":true}],[{"text":"We proceed by bounding each of the sums on the right-hand side of ","element":"span"},{"href":"#id-39","text":"Eq. (8). ","element":"a"},{"text":"We start with the second term and use a well-known regret bound for the Online Gradient Descent algorithm, due to ","element":"span"},{"href":"#id-10","referenceIndex":37,"text":"Zinkevich ","element":"a"},{"href":"#id-10","referenceIndex":37,"text":"(2003)","element":"a"},{"text":".","element":"span"}],[{"id":"id-41","style":{"width":"66%"},"width":1205,"height":152,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-25.png","element":"img"}],[{"text":"Additionally, the ","element":"span"},{"style":{"height":13.19},"width":40.78,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-26.png","element":"img","alt":" Σt","inline":true,"padRight":true},{"text":"are slowly changing in the sense that, for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":",","element":"span"}],[{"id":"id-40","style":{"width":"59%"},"width":1079,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/7-27.png","element":"img"}],[{"text":"We next bound the first term, now relying on ","element":"span"},{"href":"#id-40","text":"Eq. (9) ","element":"a"},{"text":"and the fact that the sequence of (randomized) policies chosen by ","element":"span"},{"href":"#id-36","text":"Algorithm 1 ","element":"a"},{"text":"is strongly stable.","element":"span"}],[{"id":"id-55","style":{"height":17.39},"width":520.07,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-0.png","element":"img","alt":"Lemma 5.3. If η ≤ σ2/4C¯κ2","inline":true},{"text":", it holds that","element":"span"}],[{"style":{"width":"40%"},"width":728,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-1.png","element":"img"}],[{"text":"Finally, the last term in ","element":"span"},{"href":"#id-39","text":"Eq. (8) ","element":"a"},{"text":"can be bounded using the strong stability of ","element":"span"},{"style":{"height":10.8},"width":66.02,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-2.png","element":"img","alt":" K⋆.","inline":true}],[{"id":"id-42","style":{"height":16},"width":495.42,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-3.png","element":"img","alt":"Lemma 5.4. For any (κ, γ)","inline":true},{"text":"-strongly stable ","element":"span"},{"style":{"height":14},"width":66.02,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-4.png","element":"img","alt":" K⋆,","inline":true}],[{"style":{"width":"27%"},"width":500,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-5.png","element":"img"}],[{"text":"The theorem now follows by plugging in the bounds we established in ","element":"span"},{"href":"#id-41","text":"Lemmas 5.2 ","element":"a"},{"text":"to ","element":"span"},{"href":"#id-42","text":"5.4 ","element":"a"},{"text":"into ","element":"span"},{"href":"#id-39","text":"Eq. (8) ","element":"a"},{"text":"and setting our choices of ","element":"span"},{"style":{"height":14.8},"width":132.96,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-6.png","element":"img","alt":" η and ν","inline":true},{"text":". (See ","element":"span"},{"text":"Appendix A ","element":"span"},{"text":"for details.)","element":"span"}]]},{"heading":"6 Oracle-based Algorithm","paragraphs":[[{"text":"In this section we present a different approach that is based on Follow the Lazy Leader of ","element":"span"},{"href":"#id-11","referenceIndex":24,"text":"Kalai & ","element":"a"},{"href":"#id-11","referenceIndex":24,"text":"Vempala ","element":"a"},{"href":"#id-11","referenceIndex":24,"text":"(2005)","element":"a"},{"text":". In contrast to ","element":"span"},{"href":"#id-36","text":"Algorithm 1, ","element":"a"},{"text":"this approach does not require a lower bound on the noise but rather relies on occasionally performing resets, and needs a bound on the cost of this reset (this is established in ","element":"span"},{"text":"Appendix C ","element":"span"},{"text":"under reasonable assumptions). We assume access to an ","element":"span"},{"text":"Oracle ","element":"span"},{"text":"procedure that receives cost matrices ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"R","element":"span"},{"text":", and parameter ","element":"span"},{"style":{"height":11.2},"width":95.35,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-7.png","element":"img","alt":" ν > 0","inline":true},{"text":". It returns a control matrix ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"that minimizes the steady-state cost, subject to ","element":"span"},{"style":{"height":17.79},"width":438.35,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-8.png","element":"img","alt":" Tr(X) + Tr(KXKT) ≤ ν","inline":true},{"text":", where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"is the steady-state covariance matrix associated with ","element":"span"},{"style":{"height":13.38},"width":63.76,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-9.png","element":"img","alt":" K.4","inline":true}],[{"id":"id-43","style":{"width":"99%"},"width":1806,"height":858,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-10.png","element":"img"}],[{"href":"#id-43","text":"Algorithm 2 ","element":"a"},{"text":"is similar to Follow the Perturbed Leader, and in fact behaves the same in expectation. At every round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", ","element":"span"},{"text":"Oracle ","element":"span"},{"text":"is called using the sum of previously seen ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"text":"s and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"R","element":"span"},{"text":"s plus an additional random noise, ","element":"span"},{"style":{"height":16.38},"width":365.32,"height":40.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-11.png","element":"img","alt":" Qpt and Rpt . Oracle","inline":true,"padRight":true},{"text":"returns a matrix ","element":"span"},{"style":{"height":13.19},"width":45.84,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-12.png","element":"img","alt":" Kt ","inline":true,"padRight":true},{"text":"that is used to choose ","element":"span"},{"style":{"height":13.19},"width":185.66,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-13.png","element":"img","alt":" ut = Ktxt.","inline":true}],[{"text":"For the measure ","element":"span"},{"style":{"height":14},"width":44.76,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-14.png","element":"img","alt":" dµ","inline":true},{"text":", we use the joint measure over symmetric matrices ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"R","element":"span"},{"text":", whose upper triangle is sampled coordinate-wise i.i.d from Laplace(","element":"span"},{"style":{"height":16},"width":59.71,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-15.png","element":"img","alt":"1/η","inline":true},{"text":"). The \"lazyness\" of the algorithm stems from ","element":"span"},{"style":{"height":17.15},"width":192.98,"height":42.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-16.png","element":"img","alt":"Qp1, . . . , QpT","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":17.15},"width":190.8,"height":42.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-17.png","element":"img","alt":" Rp1, . . . , RpT","inline":true,"padRight":true},{"text":"being sampled dependently over time such that the cumulative perturbed ","element":"span"},{"text":"loss only changes with small probability between rounds. Consequently, the expected number of switches of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"as well as the expected number of resets are only ","element":"span"},{"style":{"height":16},"width":123.54,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/8-18.png","element":"img","alt":" O(ηT).","inline":true}],[{"text":"The reset step in the algorithm, informally, drives the system to zero at some cost. Here we assume that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"has full column-rank in which case we can reset in one step. In ","element":"span"},{"text":"Appendix C, ","element":"span"},{"text":"we show how resetting can be done over a sequence of steps under much weaker assumptions.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Observation 6.1. ","element":"span"},{"text":"Suppose that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"has full column-rank. Resetting the system in round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"can be done by setting ","element":"span"},{"style":{"height":15.78},"width":234.84,"height":39.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-0.png","element":"img","alt":" ut = −B†Axt","inline":true},{"text":", such that at the next round ","element":"span"},{"style":{"height":10.79},"width":211.69,"height":26.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-1.png","element":"img","alt":" xt+1 = wt+1","inline":true},{"text":". Moreover, the expected cost of the reset is at most ","element":"span"},{"style":{"height":17.39},"width":300.62,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-2.png","element":"img","alt":" Cν(1 + ∥B†A∥2).","inline":true}],[{"text":"For ","element":"span"},{"href":"#id-43","text":"Algorithm 2 ","element":"a"},{"text":"we will show the following regret bound.","element":"span"}],[{"id":"id-45","style":{"fontWeight":"bold"},"text":"Theorem 6.2. ","element":"span"},{"text":"Assume that ","element":"span"},{"style":{"height":17.38},"width":207.73,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-3.png","element":"img","alt":" Tr(W) ≤ λ2","inline":true},{"text":", and suppose that the cost of a reset is at most ","element":"span"},{"style":{"height":13.19},"width":43.48,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-4.png","element":"img","alt":" Cr","inline":true},{"text":". Then for ","element":"span"},{"style":{"height":17.39},"width":221.36,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-5.png","element":"img","alt":"ν = 2κ4λ2/γ","inline":true},{"text":", the expected regret of ","element":"span"},{"href":"#id-43","text":"Algorithm 2 ","element":"a"},{"text":"against any ","element":"span"},{"style":{"height":16},"width":95.32,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-6.png","element":"img","alt":" (κ, γ)","inline":true},{"text":"-strongly-stable control matrix ","element":"span"},{"style":{"height":10.8},"width":50.69,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-7.png","element":"img","alt":" K⋆","inline":true}],[{"text":"satisfies","element":"span"}],[{"style":{"width":"52%"},"width":942,"height":53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-8.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Remark 6.3. ","element":"span"},{"text":"Oracle ","element":"span"},{"text":"requires that the matrices ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"R ","element":"span"},{"text":"are PSD. Nonetheless, we invoke ","element":"span"},{"text":"Oracle ","element":"span"},{"text":"using the perturbed cumulative loss ","element":"span"},{"style":{"height":18.83},"width":332.89,"height":47.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-9.png","element":"img","alt":" ( ˆQt + Qpt , ˆRt + Rpt )","inline":true,"padRight":true},{"text":"that might not be PSD, as the perturbations ","element":"span"},{"style":{"height":16.38},"width":48.5,"height":40.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-10.png","element":"img","alt":" Qpt","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16.38},"width":47.57,"height":40.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-11.png","element":"img","alt":" Rpt","inline":true,"padRight":true},{"text":"themselves are typically not PSD. To solve this issue, we first notice that with high-probability ","element":"span"},{"href":"#id-44","referenceIndex":34,"text":"(Vershynin, ","element":"a"},{"href":"#id-44","referenceIndex":34,"text":"2010)","element":"a"},{"text":", we have ","element":"span"},{"style":{"height":16.46},"width":282.32,"height":41.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-12.png","element":"img","alt":" ∥Qpt ∥ ≤ O(d/η)","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16.46},"width":282.64,"height":41.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-13.png","element":"img","alt":" ∥Rpt ∥ ≤ O(k/η)","inline":true},{"text":". Therefore, to guarantee that the ","element":"span"},{"text":"perturbed cumulative loss is PSD, we can add an initial large pretend loss by setting ","element":"span"},{"style":{"height":16},"width":216.2,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-14.png","element":"img","alt":"�Q1 = (d/η)I","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16},"width":216.78,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-15.png","element":"img","alt":"�R1 = (k/η)I","inline":true},{"text":". This would contribute an ","element":"span"},{"style":{"height":16},"width":281.47,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-16.png","element":"img","alt":" O(Cν(d + k)/η)","inline":true,"padRight":true},{"text":"term to the regret which ensures that, by our choice of ","element":"span"},{"style":{"height":10.4},"width":20,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-17.png","element":"img","alt":" η","inline":true},{"text":", ","element":"span"},{"href":"#id-45","text":"Theorem 6.2 ","element":"a"},{"text":"still holds.","element":"span"}],[{"href":"#id-45","style":{"height":14.4},"width":672.4,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-18.png","element":"img","alt":"Proof of Theorem 6.2. Let �X1, . . . , �XT","inline":true,"padRight":true},{"text":"be the actual observed covariance matrices induced by ","element":"span"},{"href":"#id-43","text":"Algorithm 2. ","element":"a"},{"text":"Also, let ","element":"span"},{"style":{"height":15.19},"width":202.16,"height":37.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-19.png","element":"img","alt":"�X⋆1, . . . , �X⋆T ","inline":true,"padRight":true},{"text":"be the covariances induced by using a fixed control matrix ","element":"span"},{"style":{"height":10.8},"width":50.69,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-20.png","element":"img","alt":" K⋆","inline":true,"padRight":true},{"text":"throughout. Similarly, ","element":"span"},{"text":"define ","element":"span"},{"style":{"height":14},"width":195.46,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-21.png","element":"img","alt":" X1, . . . , XT","inline":true,"padRight":true},{"text":"to be the covariance matrices of the steady-state distributions induced by ","element":"span"},{"style":{"height":14},"width":197.12,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-22.png","element":"img","alt":" K1, . . . , KT","inline":true,"padRight":true},{"text":"respectively, and ","element":"span"},{"style":{"height":11.6},"width":265.45,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-23.png","element":"img","alt":" X⋆ that of K⋆.","inline":true,"padRight":true},{"text":"As in the analysis of OGD, the expected regret can be decomposed as follows:","element":"span"}],[{"id":"id-46","style":{"width":"77%"},"width":1398,"height":534,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-24.png","element":"img"}],[{"text":"The second term in ","element":"span"},{"href":"#id-46","text":"Eq. (10), ","element":"a"},{"text":"the regret in the “idealized setting”, is bounded due to ","element":"span"},{"href":"#id-11","referenceIndex":24,"text":"Kalai & Vempala ","element":"a"},{"href":"#id-11","referenceIndex":24,"text":"(2005)","element":"a"},{"text":". It requires the additional observation that, by ","element":"span"},{"href":"#id-47","text":"Lemma 3.3, ","element":"a"},{"text":"we have ","element":"span"},{"style":{"height":17.78},"width":538.98,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-25.png","element":"img","alt":" Tr(X⋆)+Tr(K⋆X⋆(K⋆)T) ≤ ν.","inline":true}],[{"id":"id-48","style":{"height":16},"width":1011.91,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-26.png","element":"img","alt":"Lemma 6.4. Assume Tr(Qt), Tr(Rt) ≤ C for all t. Then,","inline":true}],[{"style":{"width":"57%"},"width":1031,"height":230,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-27.png","element":"img"}],[{"text":"Moreover, the probability that the algorithm changes ","element":"span"},{"style":{"height":13.19},"width":45.84,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-28.png","element":"img","alt":" Kt","inline":true,"padRight":true},{"text":"and performs a reset at any step ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"is at most ","element":"span"},{"style":{"height":17.32},"width":188.21,"height":43.3,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-29.png","element":"img","alt":"ηC√d + k.","inline":true}],[{"text":"The third term of ","element":"span"},{"href":"#id-46","text":"Eq. (10) ","element":"a"},{"text":"is bounded by ","element":"span"},{"style":{"height":17.39},"width":156.14,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-30.png","element":"img","alt":" 2Cκ4ν/γ","inline":true,"padRight":true},{"text":"due to ","element":"span"},{"href":"#id-42","text":"Lemma 5.4. ","element":"a"},{"text":"It remains to bound the first term in the equation. To that end, we will next show that after the system is reset, the cost of the learner ","element":"span"},{"id":"id-49","text":"on round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"is at most that of the steady-state induced by ","element":"span"},{"style":{"height":13.19},"width":58.88,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/9-31.png","element":"img","alt":" Kt.","inline":true}],[{"style":{"width":"61%"},"width":1118,"height":391,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-0.png","element":"img"}],[{"id":"id-50","text":"Figure 1: Data center cooling loop; see ","element":"figcaption","subtype":"caption"},{"text":"Section 7.","element":"span","subtype":"caption"}],[{"style":{"fontWeight":"bold"},"text":"Lemma 6.5. ","element":"span"},{"text":"Suppose the learner starts playing ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"at state ","element":"span"},{"style":{"height":10.78},"width":162.19,"height":26.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-1.png","element":"img","alt":" xt0 = wt0","inline":true},{"text":". Then the expected cost of the learner is always less then the steady-state cost induced by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K","element":"span"},{"text":".","element":"span"}],[{"style":{"width":"0%"},"width":6,"height":2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-2.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Let ","element":"span"},{"style":{"height":10.78},"width":160.56,"height":26.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-3.png","element":"img","alt":" xt0 = wt0","inline":true},{"text":", and recall that ","element":"span"},{"style":{"height":16},"width":436.85,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-4.png","element":"img","alt":" xt+1 = (A + BK)xt + wt","inline":true},{"text":". Let ","element":"span"},{"style":{"height":13.19},"width":48.14,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-5.png","element":"img","alt":"�Xt","inline":true,"padRight":true},{"text":"be the covariance of ","element":"span"},{"style":{"height":9.19},"width":34.78,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-6.png","element":"img","alt":" xt","inline":true},{"text":", and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"be the covariance of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"at the steady-state induced by ","element":"span"},{"style":{"height":17.79},"width":886.39,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-7.png","element":"img","alt":" K. Then, Xt0 = (A + BKt0)Xt0(A + BKt0)T + W.","inline":true,"padRight":true},{"text":"We now show that ","element":"span"},{"style":{"height":13.2},"width":151.13,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-8.png","element":"img","alt":"�Xt ⪯ X","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":12.8},"width":110.74,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-9.png","element":"img","alt":" t ≥ t0","inline":true,"padRight":true},{"text":"by induction. Indeed, for the base case ","element":"span"},{"style":{"height":33.93},"width":1809.77,"height":84.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-10.png","element":"img","alt":"�Xt0 = W ⪯(A + BKt0)Xt0(A + BKt0)T + W = X","inline":true},{"text":". Now assume that ","element":"span"},{"style":{"height":14.78},"width":162.36,"height":36.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-11.png","element":"img","alt":"�Xt ⪯ Xt0","inline":true},{"text":", that implies","element":"span"}],[{"style":{"width":"46%"},"width":839,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-12.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":17.72},"width":244.7,"height":44.31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-13.png","element":"img","alt":" Qt + KTt RtKt","inline":true,"padRight":true},{"text":"is PSD, the expected cost of the learner at time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"is ","element":"span"},{"style":{"height":17.78},"width":412.96,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-14.png","element":"img","alt":" (Qt + KTt RtKt) • Xt ≤","inline":true},{"style":{"height":17.78},"width":358.39,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-15.png","element":"img","alt":"(Qt + KTt RtKt) • X.","inline":true}],[{"text":"Combining ","element":"span"},{"href":"#id-48","text":"Lemmas 6.4 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-49","text":"6.5 ","element":"a"},{"text":"obtains the theorem (see ","element":"span"},{"text":"Appendix A ","element":"span"},{"text":"for more details).","element":"span"}]]},{"heading":"7 Experiments","paragraphs":[[{"text":"We demonstrate our approach on the problem of regulating conditions inside a data center (DC) server floor in the presence of time-varying power costs. We learn system dynamics from a real data center, but vary the costs and run algorithms in simulation.","element":"span"}],[{"href":"#id-50","text":"Fig. 1 ","element":"a"},{"text":"shows a schematic of the cooling loop of a typical data center. Water is cooled to sub-ambient temperatures in the chiller and evaporative cooling towers, and then sent to multiple air handling units (AHUs) on the server floor. Server racks are arranged into rows with alternating hot and cold aisles, such that all hot air exhausts face the hot aisle. The AHUs circulate air through the building; hot air is cooled through air-water heat exchange and blown into the cold aisle, and the resulting warm water is sent back to the chiller and cooling towers. The primary goal of floor-level cooling is to control the cold aisle temperatures (CATs) and differential air pressures (DPs). The control vector includes the blower speed and water valve command for each of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= 30 ","element":"span"},{"text":"AHUs, set every 30s. The state vector includes ","element":"span"},{"text":"2","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"temperature measurements and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"pressure measurements, as well as sensor measurements and controls for the preceding time step. System noise is in part due to variability in server loads and the temperature of the chilled water.","element":"span"}],[{"text":"We learn a linear approximation ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"A, B","element":"span"},{"text":") ","element":"span"},{"text":"of the dynamics in the operating range of interest on 4h of exploratory data with controls following a random walk. We estimate the system noise covariance ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"text":"as the empirical covariance of training data residuals. For the purpose of the experiment, we amplify the noise by a factor of 5. We set the diagonal coefficients of ","element":"span"},{"style":{"height":14},"width":43.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-16.png","element":"img","alt":" Qt","inline":true,"padRight":true},{"text":"corresponding to the most recent (normalized) sensor measurements to 1 and remaining coefficients to 0, and keep ","element":"span"},{"style":{"height":14},"width":135.33,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-17.png","element":"img","alt":" Qt = Q","inline":true,"padRight":true},{"text":"constant throughout the experiment. We set diagonal coefficients of ","element":"span"},{"style":{"height":13.19},"width":42.26,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-18.png","element":"img","alt":" Rt","inline":true,"padRight":true},{"text":"corresponding to water usage (valve command) to 1 throughout, and all coefficients corresponding to power usage (fan speed) to ","element":"span"},{"style":{"height":9.19},"width":29.98,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-19.png","element":"img","alt":" rt","inline":true},{"text":". We generate ","element":"span"},{"style":{"height":9.19},"width":29.98,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-20.png","element":"img","alt":" rt","inline":true,"padRight":true},{"text":"by (a) i.i.d sampling a uniform distribution on ","element":"span"},{"text":"[0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"1]","element":"span"},{"text":", and (b) using a random walk restricted to ","element":"span"},{"text":"[0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"1] ","element":"span"},{"text":"taking steps of size ","element":"span"},{"style":{"height":14},"width":188.26,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-21.png","element":"img","alt":" 0.1, −0.1, 0","inline":true,"padRight":true},{"text":"with probabilities ","element":"span"},{"style":{"height":14},"width":219.25,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/10-22.png","element":"img","alt":" 0.1, −0.1, 0.8","inline":true,"padRight":true},{"text":"respectively.","element":"span"}],[{"style":{"width":"60%"},"width":1088,"height":860,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/11-0.png","element":"img"}],[{"id":"id-51","text":"Figure 2: Normalized regret ","element":"figcaption","subtype":"caption"},{"style":{"height":16},"width":104.29,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/11-1.png","element":"img","alt":" RT /T","inline":true,"padRight":true},{"text":"for FLL and Recent strategies, with power costs generated uniformly (top) and by random walk (bottom). Resets occur at time steps indicated by dashed lines.","element":"figcaption","subtype":"caption"}],[{"text":"We run the FLL algorithm on this problem with the following modifications: we set ","element":"span"},{"style":{"height":16.71},"width":138.47,"height":41.78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/11-2.png","element":"img","alt":" Qp1 = Q","inline":true},{"text":", and ","element":"span"},{"style":{"height":16.71},"width":142.4,"height":41.78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/11-3.png","element":"img","alt":"Rp1 = Ik","inline":true},{"text":", an upper bound on ","element":"span"},{"style":{"height":13.19},"width":42.26,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/11-4.png","element":"img","alt":" Rt","inline":true},{"text":". Rather than executing hard resets to 0, we perform a soft reset by ","element":"span"},{"text":"running a policy ","element":"span"},{"style":{"height":13.19},"width":106.77,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/11-5.png","element":"img","alt":" Kreset","inline":true,"padRight":true},{"text":"for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"steps. Here ","element":"span"},{"style":{"height":13.19},"width":106.77,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/11-6.png","element":"img","alt":" Kreset","inline":true,"padRight":true},{"text":"is similar to the next FLL policy, but based on the 1.1 times the corresponding state cost ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"text":".","element":"span"}],[{"text":"We compare the cost of FLL to that of a fixed linear controller that is based on the average of the ","element":"span"},{"style":{"height":13.19},"width":42.26,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/11-7.png","element":"img","alt":"Rt","inline":true,"padRight":true},{"text":"matrices, and to a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Recent ","element":"span"},{"text":"strategy which selects one of ten controllers corresponding to power costs in ","element":"span"},{"style":{"height":16},"width":323.35,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/11-8.png","element":"img","alt":" r ∈ {0.1, 0.2, ..., 1}","inline":true,"padRight":true},{"text":"based on the most recently observed ","element":"span"},{"style":{"height":13.19},"width":42.26,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/11-9.png","element":"img","alt":" Rt","inline":true},{"text":". The normalized regret ","element":"span"},{"style":{"height":19.37},"width":81.15,"height":48.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/11-10.png","element":"img","alt":"1T RT","inline":true,"padRight":true},{"text":"of to the ","element":"span"},{"text":"two strategies is shown in ","element":"span"},{"href":"#id-51","text":"Fig. 2. ","element":"a"},{"text":"FLL performance quickly approaches that of the fixed linear policy in both cases, and is better than the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Recent ","element":"span"},{"text":"strategy on uniform random costs. The ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Recent ","element":"span"},{"text":"strategy has an advantage in the case where costs vary slowly, and empirical performance of FLL could likely be improved in this case by forgetting the old costs.","element":"span"}]]},{"heading":"References","paragraphs":[[{"id":"id-14","text":"Abbasi, Y., Bartlett, P. L., Kanade, V., Seldin, Y., and Szepesvári, C. Online learning in markov ","element":"span"},{"text":"decision processes with adversarially chosen transition probability distributions. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in neural information processing systems","element":"span"},{"text":", pp. 2508–2516, 2013.","element":"span"}],[{"id":"id-22","text":"Abbasi-Yadkori, Y. and Szepesvári, C. Regret bounds for the adaptive control of linear quadratic systems. ","element":"span"},{"text":"In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 24th Annual Conference on Learning Theory","element":"span"},{"text":", pp. 1–26, 2011.","element":"span"}],[{"id":"id-9","text":"Abbasi-Yadkori, Y., Bartlett, P., and Kanade, V. Tracking adversarial targets. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Machine Learning","element":"span"},{"text":", pp. 369–377, 2014.","element":"span"}],[{"id":"id-3","text":"Abbeel, P., Coates, A., Quigley, M., and Ng, A. Y. An application of reinforcement learning to aerobatic ","element":"span"},{"text":"helicopter flight. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in neural information processing systems","element":"span"},{"text":", pp. 1–8, 2007.","element":"span"}],[{"text":"Abeille, M. and Lazaric, A. Thompson sampling for linear-quadratic control problems. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"AISTATS","element":"span"},{"text":", 2017.","element":"span"}],[{"id":"id-0","text":"Anderson, B., Moore, J., and Molinari, B. Linear optimal control. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Transactions on Systems, Man, and Cybernetics","element":"span"},{"text":", (4):559–559, 1972.","element":"span"}],[{"id":"id-21","text":"Arora, S., Hazan, E., Lee, H., Singh, K., Zhang, C., and Zhang, Y. Towards provable control for ","element":"span"},{"text":"unknown linear dynamical systems. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Learning Representations","element":"span"},{"text":", 2018. URL ","element":"span"},{"href":"https://openreview.net/forum?id=BygpQlbA-","style":{"fontFamily":"monospace"},"text":"https://openreview.net/forum?id=BygpQlbA-","element":"a"},{"text":". workshop track.","element":"span"}],[{"text":"Åström, K. J. and Wittenmark, B. On self tuning regulators. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Automatica","element":"span"},{"text":", 9(2):185–199, 1973.","element":"span"}],[{"text":"Auer, P. and Ortner, R. Logarithmic online regret bounds for undiscounted reinforcement learning. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Neural Information Processing Systems","element":"span"},{"text":", pp. 49–56, 2007.","element":"span"}],[{"id":"id-24","text":"Balakrishnan, V. and Vandenberghe, L. Semidefinite programming duality and linear time-invariant ","element":"span"},{"text":"systems. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Transactions on Automatic Control","element":"span"},{"text":", 48(1):30–41, 2003.","element":"span"}],[{"id":"id-1","text":"Bertsekas, D. P. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Dynamic programming and optimal control","element":"span"},{"text":", volume 1. Athena scientific Belmont, MA, 1995.","element":"span"}],[{"text":"Bittanti, S. and Campi, M. C. Adaptive control of linear time invariant systems: the bet on the best principle. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Communications in Information & Systems","element":"span"},{"text":", 6(4):299–320, 2006.","element":"span"}],[{"text":"Bradtke, S. J. Reinforcement learning applied to linear quadratic regulation. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in neural information processing systems","element":"span"},{"text":", pp. 295–302, 1993.","element":"span"}],[{"text":"Campi, M. C. and Kumar, P. Adaptive linear quadratic gaussian control: the cost-biased approach revisited. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"SIAM Journal on Control and Optimization","element":"span"},{"text":", 36(6):1890–1907, 1998.","element":"span"}],[{"id":"id-6","text":"Cesa-Bianchi, N. and Lugosi, G. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Prediction, learning, and games","element":"span"},{"text":". Cambridge university press, 2006.","element":"span"}],[{"id":"id-19","text":"Dean, S., Mania, H., Matni, N., Recht, B., and Tu, S. On the sample complexity of the linear quadratic ","element":"span"},{"text":"regulator. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1710.01688","element":"span"},{"text":", 2017.","element":"span"}],[{"id":"id-25","text":"Dvijotham, K., Todorov, E., and Fazel, M. Convex control design via covariance minimization. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Communication, Control, and Computing (Allerton), 2013 51st Annual Allerton Conference on","element":"span"},{"text":", pp. 93–99. IEEE, 2013.","element":"span"}],[{"id":"id-12","text":"Even-Dar, E., Kakade, S. M., and Mansour, Y. Online markov decision processes. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mathematics of Operations Research","element":"span"},{"text":", 34(3):726–736, 2009.","element":"span"}],[{"id":"id-18","text":"Fazel, M., Ge, R., Kakade, S. M., and Mesbahi, M. Global convergence of policy gradient methods for ","element":"span"},{"text":"linearized control problems. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1801.05039","element":"span"},{"text":", 2018.","element":"span"}],[{"text":"Gao, J. and Jamidar, R. Machine learning applications for data center optimization. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Google White Paper","element":"span"},{"text":", 2014.","element":"span"}],[{"id":"id-7","text":"Hazan, E. Introduction to online convex optimization. ","element":"span"},{"style":{"height":15.2},"width":768.36,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/12-0.png","element":"img","alt":" Foundations and Trends R⃝ in Optimization","inline":true},{"text":", 2 (3-4):157–325, 2016.","element":"span"}],[{"id":"id-20","text":"Hazan, E., Singh, K., and Zhang, C. Learning linear dynamical systems via spectral filtering. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Neural Information Processing Systems","element":"span"},{"text":", pp. 6705–6715, 2017.","element":"span"}],[{"id":"id-23","text":"Ibrahimi, M., Javanmard, A., and Roy, B. V. Efficient reinforcement learning for high dimensional linear ","element":"span"},{"text":"quadratic systems. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Neural Information Processing Systems 25","element":"span"},{"text":", pp. 2636–2644. Curran Associates, Inc., 2012.","element":"span"}],[{"id":"id-11","text":"Kalai, A. and Vempala, S. Efficient algorithms for online decision problems. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Computer and System Sciences","element":"span"},{"text":", 71(3):291–307, 2005.","element":"span"}],[{"id":"id-26","text":"Lee, D.-H. and Hu, J. A semidefinite programming formulation of the lqr problem and its dual. 2016.","element":"span"}],[{"id":"id-27","text":"Lee, J.-W. and Khargonekar, P. P. Constrained infinite-horizon linear quadratic regulation of discrete-time ","element":"span"},{"text":"systems. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Transactions on Automatic Control","element":"span"},{"text":", 52(10):1951–1958, 2007.","element":"span"}],[{"id":"id-4","text":"Levine, S., Finn, C., Darrell, T., and Abbeel, P. End-to-end training of deep visuomotor policies. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Journal of Machine Learning Research","element":"span"},{"text":", 17(1):1334–1373, 2016.","element":"span"}],[{"text":"Lewis, F. L. and Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE circuits and systems magazine","element":"span"},{"text":", 9(3), 2009.","element":"span"}],[{"id":"id-16","text":"Neu, G. and Gómez, V. Fast rates for online learning in linearly solvable markov decision processes. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of Machine Learning Research vol","element":"span"},{"text":", 65:1–22, 2017.","element":"span"}],[{"id":"id-28","text":"Schildbach, G., Goulart, P., and Morari, M. Linear controller design for chance constrained systems. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Automatica","element":"span"},{"text":", 51:278–284, 2015.","element":"span"}],[{"id":"id-8","text":"Shalev-Shwartz, S. Online learning and online convex optimization. ","element":"span"},{"style":{"height":15.2},"width":652.94,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/13-0.png","element":"img","alt":" Foundations and Trends R⃝ in Machine","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"Learning","element":"span"},{"text":", 4(2):107–194, 2012.","element":"span"}],[{"id":"id-5","text":"Sheckells, M., Garimella, G., and Kobilarov, M. Robust policy search with applications to safe vehicle ","element":"span"},{"text":"navigation. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Robotics and Automation (ICRA), 2017 IEEE International Conference on","element":"span"},{"text":", pp. 2343– 2349. IEEE, 2017.","element":"span"}],[{"id":"id-17","text":"Todorov, E. Efficient computation of optimal actions. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the national academy of sciences","element":"span"},{"text":", 106(28):11478–11483, 2009.","element":"span"}],[{"id":"id-44","text":"Vershynin, R. ","element":"span"},{"text":"Introduction to the non-asymptotic analysis of random matrices. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1011.3027","element":"span"},{"text":", 2010.","element":"span"}],[{"id":"id-13","text":"Yu, J. Y., Mannor, S., and Shimkin, N. Markov decision processes with arbitrary reward processes. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mathematics of Operations Research","element":"span"},{"text":", 34(3):737–757, 2009.","element":"span"}],[{"id":"id-2","text":"Zhou, K., Doyle, J. C., Glover, K., et al. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Robust and optimal control","element":"span"},{"text":", volume 40. Prentice hall New Jersey, 1996.","element":"span"}],[{"id":"id-10","text":"Zinkevich, M. Online convex programming and generalized infinitesimal gradient ascent. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 20th International Conference on Machine Learning (ICML-03)","element":"span"},{"text":", pp. 928–936, 2003.","element":"span"}]]},{"heading":"A Technical Proofs","paragraphs":[[{"style":{"fontWeight":"bold"},"text":"A.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of ","element":"span"},{"href":"#id-47","style":{"fontWeight":"bold"},"text":"Lemma 3.3","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Strong stability ensures that ","element":"span"},{"style":{"height":16},"width":409.7,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/13-1.png","element":"img","alt":" ρ(A + BK) < 1, and so","inline":true}],[{"style":{"width":"35%"},"width":633,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/13-2.png","element":"img"}],[{"text":"Write ","element":"span"},{"style":{"height":17.39},"width":1225.74,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/13-3.png","element":"img","alt":" A + BK = HLH−1 such that ∥L∥ ≤ 1 − γ and ∥H∥∥H−1∥ ≤ κ. Then","inline":true}],[{"style":{"width":"43%"},"width":777,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/13-4.png","element":"img"}],[{"text":"As a result,","element":"span"}],[{"style":{"width":"42%"},"width":773,"height":243,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/13-5.png","element":"img"}],[{"text":"Further, notice that ","element":"span"},{"style":{"height":16.99},"width":364.48,"height":42.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/13-6.png","element":"img","alt":" U = KXKT, whence","inline":true}],[{"style":{"width":"73%"},"width":1326,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/13-7.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"A.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of ","element":"span"},{"href":"#id-52","style":{"fontWeight":"bold"},"text":"Lemma 3.5","element":"a"}],[{"style":{"height":16.4},"width":707,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-0.png","element":"img","alt":"Proof. Denote Ct = Cov[ut|xt] (where ut","inline":true,"padRight":true},{"text":"is the action taken on round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":"). By definition, for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"we have","element":"span"}],[{"style":{"width":"47%"},"width":867,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-1.png","element":"img"}],[{"text":"Subtracting the equations, substituting ","element":"span"},{"style":{"height":18.34},"width":377.4,"height":45.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-2.png","element":"img","alt":" A + BKt = HtLtH−1t","inline":true,"padRight":true},{"text":"and rearranging yields","element":"span"}],[{"style":{"width":"53%"},"width":960,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-3.png","element":"img"}],[{"text":"Denote ","element":"span"},{"style":{"height":18.42},"width":629.7,"height":46.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-4.png","element":"img","alt":" ∆t = H−1t ( �Xt − Xt)(H−1t )T for all t","inline":true},{"text":". Then the above can be rewritten as","element":"span"}],[{"id":"id-53","style":{"width":"67%"},"width":1223,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-5.png","element":"img"}],[{"text":"Let us first analyze the simpler case where all policies ","element":"span"},{"style":{"height":13.19},"width":45.84,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-6.png","element":"img","alt":" Kt","inline":true,"padRight":true},{"text":"converge to the same steady-state covariance ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":". Then ","element":"span"},{"style":{"height":13.59},"width":279.91,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-7.png","element":"img","alt":" Xt = X for all t","inline":true},{"text":", thus ","element":"span"},{"href":"#id-53","text":"Eq. (11) ","element":"a"},{"text":"reads","element":"span"}],[{"style":{"width":"34%"},"width":628,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-8.png","element":"img"}],[{"text":"Taking norms, we obtain","element":"span"}],[{"style":{"width":"31%"},"width":565,"height":179,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-9.png","element":"img"}],[{"text":"whence ","element":"span"},{"style":{"height":16.98},"width":347.53,"height":42.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-10.png","element":"img","alt":" ∥∆t+1∥ ≤ e−γt∥∆1∥","inline":true},{"text":". Recalling ","element":"span"},{"style":{"height":17.72},"width":336.2,"height":44.31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-11.png","element":"img","alt":"�Xt − X = Ht∆tHTt ","inline":true,"padRight":true},{"text":", we obtain","element":"span"}],[{"style":{"width":"44%"},"width":795,"height":178,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-12.png","element":"img"}],[{"text":"For the general case, taking norms in ","element":"span"},{"href":"#id-53","text":"Eq. (11) ","element":"a"},{"text":"results in","element":"span"}],[{"style":{"width":"46%"},"width":841,"height":139,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-13.png","element":"img"}],[{"text":"and unfolding the recursion we obtain","element":"span"}],[{"style":{"width":"99%"},"width":1803,"height":294,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-14.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"A.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of ","element":"span"},{"href":"#id-54","style":{"fontWeight":"bold"},"text":"Lemma 4.4","element":"a"}],[{"style":{"width":"64%"},"width":1165,"height":169,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-15.png","element":"img"}],[{"text":"(cf. ","element":"span"},{"href":"#id-33","text":"Eq. (5))","element":"a"},{"text":". Now, since ","element":"span"},{"style":{"height":13.19},"width":40.78,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-16.png","element":"img","alt":" Σt","inline":true,"padRight":true},{"text":"is feasible for the SDP we have","element":"span"}],[{"style":{"width":"36%"},"width":665,"height":194,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/14-17.png","element":"img"}],[{"text":"Proceeding as in the proof of ","element":"span"},{"href":"#id-38","text":"Lemma 4.3, ","element":"a"},{"text":"one can show that ","element":"span"},{"style":{"height":16},"width":191.71,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-0.png","element":"img","alt":" ∥Kt∥F ≤ κ","inline":true},{"text":", and that the matrix ","element":"span"},{"style":{"height":13.19},"width":87.68,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-1.png","element":"img","alt":" Lt =","inline":true},{"style":{"height":20.68},"width":386.78,"height":51.7,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-2.png","element":"img","alt":"X−1/2t (A + BKt)X1/2t","inline":true,"padRight":true},{"text":"satisfies ","element":"span"},{"style":{"height":17.38},"width":303.96,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-3.png","element":"img","alt":" ∥Lt∥ ≤ 1 − 1/2κ2","inline":true,"padRight":true},{"text":"with ","element":"span"},{"style":{"height":20.68},"width":234.32,"height":51.7,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-4.png","element":"img","alt":" ∥X1/2t ∥ ≤ √ν","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":20.68},"width":268.27,"height":51.7,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-5.png","element":"img","alt":" ∥X−1/2t ∥ ≤ 1/σ","inline":true},{"text":". To establish sequential strong stability it thus suffices to show that ","element":"span"},{"style":{"height":22.53},"width":458.41,"height":56.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-6.png","element":"img","alt":" ∥X−1/2t+1 X1/2t ∥ ≤ 1 + 1/4κ2","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":". To this end, observe that ","element":"span"},{"style":{"height":17.39},"width":403.61,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-7.png","element":"img","alt":" ∥Xt+1 − Xt∥ ≤ η,5 and","inline":true}],[{"style":{"width":"51%"},"width":938,"height":360,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-8.png","element":"img"}],[{"text":"Hence, if ","element":"span"},{"style":{"height":22.53},"width":981.25,"height":56.33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-9.png","element":"img","alt":" η ≤ σ2/κ2 then ∥X−1/2t+1 X1/2t ∥ ≤�1 + 1/κ2 ≤ 1 + 1/2κ2 ","inline":true,"padRight":true},{"text":"as required.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"A.4 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of ","element":"span"},{"href":"#id-41","style":{"fontWeight":"bold"},"text":"Lemma 5.2","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"The diameter of the feasible domain of the SDP (with respect to ","element":"span"},{"style":{"height":16},"width":88.51,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-10.png","element":"img","alt":" ∥ · ∥F","inline":true},{"text":") is upper bounded by ","element":"span"},{"style":{"height":10.8},"width":41.33,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-11.png","element":"img","alt":"2ν","inline":true},{"text":". Also, the linear loss function ","element":"span"},{"style":{"height":13.19},"width":218.89,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-12.png","element":"img","alt":" X �→ Lt • X","inline":true,"padRight":true},{"text":"is ","element":"span"},{"style":{"height":16},"width":368.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-13.png","element":"img","alt":" ∥Qt∥F + ∥Rt∥F ≤ 2C","inline":true,"padRight":true},{"text":"Lipschitz for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"(again, with respect to ","element":"span"},{"style":{"height":16},"width":84.31,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-14.png","element":"img","alt":" ∥ · ∥F","inline":true},{"text":"). Plugging this into the regret bound of the Online Gradient Descent algorithm gives the lemma.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"A.5 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of ","element":"span"},{"href":"#id-55","style":{"fontWeight":"bold"},"text":"Lemma 5.3","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Denote the policy used by the algorithm on round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"by ","element":"span"},{"style":{"height":16},"width":307.74,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-15.png","element":"img","alt":" πt(x) = Ktx + vt","inline":true,"padRight":true},{"text":"with ","element":"span"},{"style":{"height":16.4},"width":240.42,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-16.png","element":"img","alt":" vt ∼ N(0, Vt)","inline":true},{"text":". Notice that","element":"span"}],[{"id":"id-58","style":{"width":"56%"},"width":1027,"height":130,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-17.png","element":"img"}],[{"text":"whence ","element":"span"},{"style":{"height":16},"width":191.42,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-18.png","element":"img","alt":" E(πt) = Σt","inline":true},{"text":". Next, denote ","element":"span"},{"style":{"height":16},"width":454.31,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-19.png","element":"img","alt":" Xt = (Σt)xx, Ut = (Σt)uu","inline":true},{"text":", and similarly ","element":"span"},{"style":{"height":16},"width":217.79,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-20.png","element":"img","alt":"�Xt = (�Σt)xx","inline":true},{"text":", ","element":"span"},{"style":{"height":16},"width":214.95,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-21.png","element":"img","alt":"�U t = (�Σt)uu","inline":true},{"text":". Observe that ","element":"span"},{"style":{"height":17.73},"width":860.2,"height":44.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-22.png","element":"img","alt":"�U t = Kt �XtKTt + Vt and Ut = KtXtKTt + Vt, thus","inline":true}],[{"style":{"width":"46%"},"width":849,"height":183,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-23.png","element":"img"}],[{"text":"Further, for ","element":"span"},{"style":{"height":16},"width":181.04,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-24.png","element":"img","alt":" K = K(Σ)","inline":true,"padRight":true},{"text":"for any feasible ","element":"span"},{"style":{"height":12},"width":255.4,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-25.png","element":"img","alt":" Σ ∈ S we have","inline":true}],[{"id":"id-56","style":{"width":"99%"},"width":1800,"height":331,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-26.png","element":"img"}],[{"text":"It is left to control the norms ","element":"span"},{"style":{"height":16},"width":185.89,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-27.png","element":"img","alt":" ∥ �Xt − Xt∥","inline":true},{"text":". To this end, recall ","element":"span"},{"href":"#id-54","text":"Lemma 4.4 ","element":"a"},{"text":"which asserts that the sequence ","element":"span"},{"style":{"height":15.6},"width":327.44,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-28.png","element":"img","alt":"K1, K2, . . . is (¯κ, ¯γ)","inline":true},{"text":"-strongly stable, since we assume ","element":"span"},{"style":{"height":17.39},"width":175.31,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-29.png","element":"img","alt":" η ≤ σ2/¯κ2","inline":true},{"text":". Now, since ","element":"span"},{"style":{"height":16},"width":513.85,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-30.png","element":"img","alt":" ∥Xt+1 −Xt∥ ≤ ∥Σt+1 −Σt∥ ≤","inline":true},{"style":{"height":14.4},"width":71.29,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-31.png","element":"img","alt":"4Cη","inline":true},{"text":", applying ","element":"span"},{"href":"#id-52","text":"Lemma 3.5 ","element":"a"},{"text":"to the sequence of randomized policies ","element":"span"},{"style":{"height":10},"width":163.06,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-32.png","element":"img","alt":" π1, π2, . . .","inline":true,"padRight":true},{"text":"now yields","element":"span"}],[{"id":"id-57","style":{"width":"69%"},"width":1260,"height":129,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-33.png","element":"img"}],[{"text":"We can further bound the right-hand side using ","element":"span"},{"style":{"height":16},"width":291.4,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-34.png","element":"img","alt":" ∥ �X1 − X1∥ ≤ 2ν","inline":true},{"text":". Combining ","element":"span"},{"href":"#id-56","text":"Eqs. (13) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-57","text":"(14) ","element":"a"},{"text":"and using the fact that ","element":"span"},{"style":{"height":21.28},"width":706.4,"height":53.21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/15-35.png","element":"img","alt":"�Tt=1 e−αt ≤� ∞0 e−αtdt = 1/α for α > 0","inline":true},{"text":", we obtain the result.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"A.6 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of ","element":"span"},{"href":"#id-42","style":{"fontWeight":"bold"},"text":"Lemma 5.4","element":"a"}],[{"style":{"width":"79%"},"width":1431,"height":175,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-0.png","element":"img"}],[{"text":"Thus ","element":"span"},{"style":{"height":17.78},"width":815.02,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-1.png","element":"img","alt":" Lt • (Σ⋆ − �Σ⋆t ) = (Qt + (K⋆)TRtK⋆) • (X − Xt)","inline":true},{"text":". Now, ","element":"span"},{"href":"#id-47","text":"Lemma 3.3 ","element":"a"},{"text":"asserts that ","element":"span"},{"style":{"height":17.38},"width":359.98,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-2.png","element":"img","alt":" Tr(Σ⋆) ≤ 2κ4/γ = ν,","inline":true,"padRight":true},{"text":"hence ","element":"span"},{"style":{"height":11.6},"width":121.83,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-3.png","element":"img","alt":" Σ⋆ ∈ S","inline":true,"padRight":true},{"text":"and by ","element":"span"},{"href":"#id-58","text":"Eq. (12), ","element":"a"},{"text":"it follows that","element":"span"}],[{"style":{"width":"40%"},"width":724,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-4.png","element":"img"}],[{"text":"Now, an application of ","element":"span"},{"href":"#id-59","text":"Lemma 3.2 ","element":"a"},{"text":"gives","element":"span"}],[{"style":{"width":"53%"},"width":969,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-5.png","element":"img"}],[{"text":"where in the ultimate inequality we have used again the fact that ","element":"span"},{"style":{"height":20.4},"width":297.78,"height":50.99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-6.png","element":"img","alt":"�Tt=1 e−γt ≤ 1/γ","inline":true},{"text":". Finally, we have ","element":"span"},{"style":{"height":16},"width":515.55,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-7.png","element":"img","alt":"∥X1 − X∥ ≤ ∥Σ⋆ − �Σ⋆0∥ ≤ 2ν.","inline":true,"padRight":true},{"text":"Combining the inequalities gives the result.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"A.7 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of ","element":"span"},{"href":"#id-37","style":{"fontWeight":"bold"},"text":"Theorem 5.1","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Plugging in the bounds we established in ","element":"span"},{"href":"#id-41","text":"Lemmas 5.2 ","element":"a"},{"text":"to ","element":"span"},{"href":"#id-42","text":"5.4 ","element":"a"},{"text":"into ","element":"span"},{"href":"#id-39","text":"Eq. (8) ","element":"a"},{"text":"and setting the values for ","element":"span"},{"style":{"height":17.38},"width":445.3,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-8.png","element":"img","alt":" ¯κ = √ν/σ and ¯γ = σ2/2ν","inline":true,"padRight":true},{"text":"(and using ","element":"span"},{"style":{"height":15.78},"width":115.55,"height":39.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-9.png","element":"img","alt":" ν ≥ σ2 ","inline":true,"padRight":true},{"text":"to simplify), we obtain","element":"span"}],[{"style":{"width":"54%"},"width":989,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-10.png","element":"img"}],[{"text":"for any ","element":"span"},{"style":{"height":10.4},"width":20,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-11.png","element":"img","alt":" η","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"style":{"height":17.38},"width":423.03,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-12.png","element":"img","alt":" η ≤ σ2/4C¯κ2 = σ4/4Cν","inline":true},{"text":". Thus, a choice of ","element":"span"},{"style":{"height":18.3},"width":307.56,"height":45.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-13.png","element":"img","alt":" η = σ3/(2C√νT)","inline":true,"padRight":true},{"text":"(for which it can be verified that ","element":"span"},{"style":{"height":17.38},"width":468.32,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-14.png","element":"img","alt":" η ≤ σ4/4Cν for T ≥ 4ν/σ2","inline":true},{"text":") gives the regret bound","element":"span"}],[{"style":{"width":"49%"},"width":893,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-15.png","element":"img"}],[{"text":"Finally, plugging in ","element":"span"},{"style":{"height":17.39},"width":219.17,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-16.png","element":"img","alt":" ν = 2κ4λ2/γ","inline":true,"padRight":true},{"text":"gives the result.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"A.8 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of ","element":"span"},{"href":"#id-45","style":{"fontWeight":"bold"},"text":"Theorem 6.2","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Since after a reset, the system starts at state ","element":"span"},{"text":"0","element":"span"},{"text":", the cost of the learner is always less than the steady-state cost. The expected number of switches is at most ","element":"span"},{"style":{"height":17.32},"width":206.21,"height":43.3,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-17.png","element":"img","alt":" ηC√d + kT","inline":true},{"text":", and whenever a switch occurs we pay an additional cost of ","element":"span"},{"style":{"height":13.19},"width":43.48,"height":32.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-18.png","element":"img","alt":" Cr","inline":true,"padRight":true},{"text":"for performing a reset. Combining that with our bounds on the three terms in ","element":"span"},{"href":"#id-46","text":"Eq. (10), ","element":"a"},{"text":"we get","element":"span"}],[{"style":{"width":"99%"},"width":1802,"height":274,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-19.png","element":"img"}]]},{"heading":"B On Strong Stability","paragraphs":[[{"text":"In this section, we give additional justification for the stability assumption. The following lemma shows that for any stable controller ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K","element":"span"},{"text":", there are finite bounds on its strong stability parameters.","element":"span"}],[{"id":"id-29","style":{"fontWeight":"bold"},"text":"Lemma B.1. ","element":"span"},{"text":"Suppose that for a linear system defined by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A, B","element":"span"},{"text":", a policy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"is stable. Then there are parameters ","element":"span"},{"style":{"height":14.4},"width":136.65,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-20.png","element":"img","alt":" κ, γ > 0","inline":true,"padRight":true},{"text":"for which it is ","element":"span"},{"style":{"height":16},"width":95.01,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/16-21.png","element":"img","alt":" (κ, γ)","inline":true},{"text":"-strongly stable.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"A theorem of Lyapunov says that a matrix ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"text":"is stable, i.e., its spectral radius is smaller than ","element":"span"},{"text":"1 ","element":"span"},{"text":"if and only if there exists a positive definite matrix ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"such that","element":"span"}],[{"style":{"width":"12%"},"width":224,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-0.png","element":"img"}],[{"text":"Indeed ","element":"span"},{"style":{"height":18.57},"width":372.1,"height":46.42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-1.png","element":"img","alt":" P = �∞i=0(M i)T(M i)","inline":true,"padRight":true},{"text":"satisfies this condition. Let ","element":"span"},{"style":{"height":15.6},"width":340.86,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-2.png","element":"img","alt":" ρ(A + BK) = 1 − γ","inline":true},{"text":". Applying the above result ","element":"span"},{"text":"to ","element":"span"},{"style":{"height":17.39},"width":344.27,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-3.png","element":"img","alt":" (1 − γ)−1(A + BK)","inline":true},{"text":", we conclude that for some positive definite matrix ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":",","element":"span"}],[{"style":{"width":"35%"},"width":633,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-4.png","element":"img"}],[{"text":"Pre- and post-multiplying by ","element":"span"},{"style":{"height":15.67},"width":74.81,"height":39.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-5.png","element":"img","alt":" P − 12","inline":true,"padRight":true},{"text":"and rearranging,","element":"span"}],[{"style":{"width":"51%"},"width":932,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-6.png","element":"img"}],[{"text":"Letting ","element":"span"},{"style":{"height":19.67},"width":389.94,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-7.png","element":"img","alt":" Q = P12 (A + BK)P − 12","inline":true,"padRight":true},{"text":", we conclude that ","element":"span"},{"style":{"height":19.67},"width":946.35,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-8.png","element":"img","alt":" A + BK = P − 12 QP12 with ∥Q∥ ≤ 1 − γ. Letting κ be","inline":true,"padRight":true},{"text":"the condition number of ","element":"span"},{"style":{"height":15.67},"width":49.9,"height":39.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-9.png","element":"img","alt":" P12","inline":true,"padRight":true},{"text":", the claim follows.","element":"span"}],[{"text":"In the following sections, we give quantitative bounds on the strong stability parameters of optimal policies ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K","element":"span"},{"text":", under certain more graspable assumptions on the system.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"B.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Invertible ","element":"span"},{"style":{"fontStyle":"italic","fontWeight":"bold"},"text":"B","element":"span"}],[{"text":"As a warmup, we start with the setting when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"is invertible. We will show quantitative bounds on the trace bound ","element":"span"},{"style":{"height":6.8},"width":21,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-10.png","element":"img","alt":" ν","inline":true,"padRight":true},{"text":"such that the optimal policy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"will be feasible for the SDP. A quantitative bound on the strong stability parameters will then follow from Lemma ","element":"span"},{"href":"#id-38","text":"4.3","element":"a"}],[{"id":"id-61","style":{"fontWeight":"bold"},"text":"Lemma B.2. ","element":"span"},{"text":"Assume that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"is square and invertible, and ","element":"span"},{"style":{"height":16},"width":757.06,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-11.png","element":"img","alt":" Tr(Q), Tr(R) ≤ C and Q, R ⪰ µI. Then for","inline":true}],[{"style":{"width":"23%"},"width":427,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-12.png","element":"img"}],[{"text":"the SDP is feasible and the trace constraint is not binding.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Consider the control matrix ","element":"span"},{"style":{"height":16.58},"width":387.34,"height":41.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-13.png","element":"img","alt":" K0 = −B−1A, and let","inline":true}],[{"style":{"width":"49%"},"width":893,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-14.png","element":"img"}],[{"text":"Then ","element":"span"},{"style":{"height":13.19},"width":44.78,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-15.png","element":"img","alt":" Σ0","inline":true,"padRight":true},{"text":"is PSD and, as ","element":"span"},{"style":{"height":13.99},"width":235.69,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-16.png","element":"img","alt":" A + BK0 = 0","inline":true},{"text":", also satisfies","element":"span"}],[{"style":{"width":"50%"},"width":904,"height":120,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-17.png","element":"img"}],[{"text":"Further, we have","element":"span"}],[{"style":{"width":"32%"},"width":594,"height":175,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-18.png","element":"img"}],[{"text":"On the other hand, ","element":"span"},{"style":{"height":19.78},"width":559.58,"height":49.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-19.png","element":"img","alt":" J(Σ0) =� Q 00 R�• Σ0 ≥ µ Tr(Σ0),","inline":true,"padRight":true},{"text":"where we have used our assumption that ","element":"span"},{"style":{"height":14},"width":188.56,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-20.png","element":"img","alt":" Q, R ⪰ µI.","inline":true,"padRight":true},{"text":"Combining the two inequalities, we see that ","element":"span"},{"style":{"height":16},"width":192.86,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-21.png","element":"img","alt":" Tr(Σ0) ≤ ν","inline":true},{"text":". Thus, we proved that ","element":"span"},{"style":{"height":13.19},"width":44.78,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-22.png","element":"img","alt":" Σ0","inline":true,"padRight":true},{"text":"is feasible.","element":"span"}],[{"text":"Finally, to see that the constraint ","element":"span"},{"style":{"height":16},"width":174.74,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-23.png","element":"img","alt":" Tr(Σ) ≤ ν","inline":true,"padRight":true},{"text":"is not binding, consider the optimal solution ","element":"span"},{"style":{"height":11.6},"width":176.52,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-24.png","element":"img","alt":" Σ⋆ for the","inline":true,"padRight":true},{"text":"SDP excluding this constraint (which is, of course, also feasible). Then, as before, ","element":"span"},{"style":{"height":16},"width":312.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-25.png","element":"img","alt":" J(Σ0) ≥ J(Σ⋆) =","inline":true}],[{"style":{"width":"99%"},"width":1799,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/17-26.png","element":"img"}],[{"text":"is therefore not binding.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"B.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Controllability","element":"span"}],[{"text":"We now define general conditions on a linear system that allow us to prove quantitative bounds on the strong stability of the optimal solution. We first recall the notion of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"controllability ","element":"span"},{"text":"of a system. A system defined by ","element":"span"},{"style":{"height":15.59},"width":324.34,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-0.png","element":"img","alt":" xt+1 = Axt + But","inline":true,"padRight":true},{"text":"is said to be ","element":"span"},{"style":{"fontStyle":"italic"},"text":"controllable ","element":"span"},{"text":"if the matrix ","element":"span"},{"style":{"height":16},"width":287.93,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-1.png","element":"img","alt":" ( B AB ··· Ad−1B )","inline":true,"padRight":true},{"text":"is full rank. A standard result in control theory says that one can drive any state ","element":"span"},{"style":{"height":9.19},"width":38.78,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-2.png","element":"img","alt":" x0","inline":true,"padRight":true},{"text":"to zero if and only if the system is controllable. We define a quantitative version of this condition.","element":"span"}],[{"style":{"height":16},"width":404.43,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-3.png","element":"img","alt":"Definition B.3 ((k, κ)","inline":true},{"text":"-Strong Controllability)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"A system defined by ","element":"span"},{"style":{"height":15.6},"width":608.07,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-4.png","element":"img","alt":" xt+1 = Axt + But is (k, κ)-strongly","inline":true,"padRight":true},{"text":"controllable if the matrix ","element":"span"},{"style":{"height":18.3},"width":849.4,"height":45.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-5.png","element":"img","alt":" Ck = ( B AB ··· Ak−1B ) satisfies ∥(CTkCk)†∥ ≤ κ.","inline":true}],[{"text":"We first show that for a strongly controllable system, any state ","element":"span"},{"style":{"height":9.19},"width":38.78,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-6.png","element":"img","alt":" x0","inline":true,"padRight":true},{"text":"can be driven to zero at bounded cost.","element":"span"}],[{"id":"id-62","style":{"fontWeight":"bold"},"text":"Lemma B.4. ","element":"span"},{"text":"Suppose that a dynamical system ","element":"span"},{"style":{"height":15.59},"width":317.47,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-7.png","element":"img","alt":" xt+1 = Axt + But","inline":true,"padRight":true},{"text":"is ","element":"span"},{"style":{"height":16},"width":94.48,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-8.png","element":"img","alt":" (k, κ)","inline":true},{"text":"-strongly controllable and that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"R ","element":"span"},{"text":"have spectral norm at most ","element":"span"},{"style":{"height":16},"width":632.78,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-9.png","element":"img","alt":" 1. Let a = max(∥A∥, 1) and b = ∥B∥","inline":true},{"text":". Then there is a constant ","element":"span"},{"style":{"height":16},"width":284.35,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-10.png","element":"img","alt":"C = C(k, κ, a, b)","inline":true,"padRight":true},{"text":"such that the system starting at a state ","element":"span"},{"style":{"height":9.19},"width":38.78,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-11.png","element":"img","alt":" x0","inline":true,"padRight":true},{"text":"can be driven to zero in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"steps at cost at most ","element":"span"},{"style":{"height":17.38},"width":127.84,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-12.png","element":"img","alt":" C∥x0∥2","inline":true},{"text":". I.e. there exist a ","element":"span"},{"style":{"height":15.59},"width":1123.1,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-13.png","element":"img","alt":" x1, . . . , xk, u0, . . . , uk−1 such that xk = 0, xt+1 = Axt + But and","inline":true}],[{"style":{"width":"33%"},"width":603,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-14.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Consider the following quadratic program:","element":"span"}],[{"style":{"width":"62%"},"width":1121,"height":236,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-15.png","element":"img"}],[{"text":"Rewriting, this is equivalent to","element":"span"}],[{"style":{"width":"58%"},"width":1063,"height":182,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-16.png","element":"img"}],[{"text":"By lemma ","element":"span"},{"href":"#id-60","text":"B.6, ","element":"a"},{"text":"the optimal solution is given by ","element":"span"},{"style":{"height":18.3},"width":383.51,"height":45.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-17.png","element":"img","alt":" (CTkCk)†Akx0, so that","inline":true}],[{"style":{"width":"44%"},"width":802,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-18.png","element":"img"}],[{"text":"For this setting of ","element":"span"},{"style":{"height":9.19},"width":34.81,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-19.png","element":"img","alt":" ut","inline":true},{"text":"’s, the corresponding ","element":"span"},{"style":{"height":14.4},"width":187.95,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-20.png","element":"img","alt":" xt’s satisfy","inline":true}],[{"style":{"width":"40%"},"width":733,"height":255,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/18-21.png","element":"img"}],[{"text":"An easy calculation then shows that for this solution,","element":"span"}],[{"style":{"width":"75%"},"width":1371,"height":741,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/19-0.png","element":"img"}],[{"text":"We now prove a generalization of Lemma ","element":"span"},{"href":"#id-61","text":"B.2 ","element":"a"},{"text":"in terms of the zeroing cost.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Theorem B.5 ","element":"span"},{"text":"(Trace Bound)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"Suppose that matrices ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"are such that for any ","element":"span"},{"style":{"height":9.19},"width":38.78,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/19-1.png","element":"img","alt":" x0","inline":true},{"text":", the system ","element":"span"},{"style":{"height":15.59},"width":329.42,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/19-2.png","element":"img","alt":"xt+1 = Axt + But","inline":true,"padRight":true},{"text":"can be driven to zero in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"steps at cost ","element":"span"},{"style":{"height":17.39},"width":127.84,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/19-3.png","element":"img","alt":" C∥x0∥2","inline":true,"padRight":true},{"text":"for cost matrices ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"I, R ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"I","element":"span"},{"text":". Consider the noisy system ","element":"span"},{"style":{"height":15.6},"width":1346.55,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/19-4.png","element":"img","alt":" xt+1 = Axt + But + wt with wt ∼ N(0, W). Then for ν = C · Tr(W), the SDP","inline":true,"padRight":true},{"text":"is feasible and the trace constraint is not binding.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"By assumption, given ","element":"span"},{"style":{"height":9.19},"width":126.65,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/19-5.png","element":"img","alt":" x0 = x","inline":true},{"text":", there is a sequence of actions ","element":"span"},{"style":{"height":16},"width":416.51,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/19-6.png","element":"img","alt":" u0(x), u1(x), . . . uk−1(x)","inline":true,"padRight":true},{"text":"and corresponding states ","element":"span"},{"style":{"height":16},"width":569.64,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/19-7.png","element":"img","alt":" x = x0(x), x1(x), . . . , xk(x) = 0","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"style":{"height":20.4},"width":695.63,"height":50.99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/19-8.png","element":"img","alt":"�k−1t=0 (∥xi(x)∥2 + ∥ui(x)∥2) ≤ C∥x∥2","inline":true},{"text":". Consider the covariance matrices","element":"span"}],[{"style":{"width":"32%"},"width":579,"height":187,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/19-9.png","element":"img"}],[{"text":"From the fact that ","element":"span"},{"style":{"height":15.59},"width":312.73,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/19-10.png","element":"img","alt":" xt+1 = Axt + But","inline":true},{"text":", it follows that","element":"span"}],[{"style":{"width":"76%"},"width":1377,"height":513,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/19-11.png","element":"img"}],[{"style":{"width":"86%"},"width":1567,"height":1035,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/20-0.png","element":"img"}],[{"text":"Further ","element":"span"},{"style":{"height":13.2},"width":101.92,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/20-1.png","element":"img","alt":" Σ ⪰ 0","inline":true,"padRight":true},{"text":"and we have","element":"span"}],[{"style":{"width":"73%"},"width":1322,"height":584,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/20-2.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"B.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Solving Least Squares","element":"span"}],[{"id":"id-60","text":"The following is a standard fact about least squares regression; we give a proof for completeness.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Lemma B.6. ","element":"span"},{"text":"Consider a QP: ","element":"span"},{"style":{"height":16.18},"width":188.3,"height":40.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/20-3.png","element":"img","alt":" minx xTAx","inline":true,"padRight":true},{"text":"subject to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Bx ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c","element":"span"},{"text":", where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"is PD. Then the value of the optimal solution is ","element":"span"},{"style":{"height":17.78},"width":269.24,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/20-4.png","element":"img","alt":" cT(BA−1BT)†c.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"When minimizing any convex function over the constraint ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Bx ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c","element":"span"},{"text":", the gradient at the optimal solution is in the null space of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":". Namely, there is some ","element":"span"},{"style":{"height":10.8},"width":23,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/20-5.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"style":{"height":13.79},"width":181.09,"height":34.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/20-6.png","element":"img","alt":" Ax = BTλ","inline":true},{"text":". Combining that with the constraint ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Bx ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c","element":"span"},{"text":", we can choose ","element":"span"},{"style":{"height":17.78},"width":298.37,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/20-7.png","element":"img","alt":" λ = (BA−1BT)†c","inline":true},{"text":". Setting that into the objective function, we get the desired result.","element":"span"}]]},{"heading":"C Bounding the Reset Cost","paragraphs":[[{"text":"Here we argue that under reasonable assumptions, the reset cost can be bounded. It will be useful to have some bound on the cost of driving a state to zero for a noiseless system. Lemma ","element":"span"},{"href":"#id-62","text":"B.4 ","element":"a"},{"text":"gives such a bound under the Strong Controllability assumption. We next give an alternate bound coming from the ","element":"span"},{"id":"id-63","text":"existence of a strongly stable policy.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Lemma C.1 ","element":"span"},{"text":"(Zeroing using Strong Stability)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"Suppose that a linear system has a ","element":"span"},{"style":{"height":16},"width":94.81,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-0.png","element":"img","alt":" (κ, γ)","inline":true},{"text":"-strongly stable policy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K","element":"span"},{"text":". Then for ","element":"span"},{"style":{"height":14},"width":157.22,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-1.png","element":"img","alt":" Q, R ⪯ I","inline":true},{"text":", a start state ","element":"span"},{"style":{"height":9.19},"width":38.78,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-2.png","element":"img","alt":" x0","inline":true,"padRight":true},{"text":"can be driven to norm at most ","element":"span"},{"style":{"height":19.38},"width":37.11,"height":48.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-3.png","element":"img","alt":"1T 2","inline":true,"padRight":true},{"text":"in ","element":"span"},{"style":{"height":16},"width":298.5,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-4.png","element":"img","alt":" O(log(T∥x0∥)/γ)","inline":true,"padRight":true},{"text":"steps at cost","element":"span"}],[{"style":{"width":"16%"},"width":296,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-5.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"be a ","element":"span"},{"style":{"height":16},"width":95.33,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-6.png","element":"img","alt":" (κ, γ)","inline":true},{"text":"-strongly stable policy. We argue that playing ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"for approximately ","element":"span"},{"style":{"height":12.39},"width":120.64,"height":30.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-7.png","element":"img","alt":" tmix =","inline":true},{"style":{"height":17.38},"width":311.84,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-8.png","element":"img","alt":"(1/γ) log(∥x0∥T 2)","inline":true,"padRight":true},{"text":"steps nearly zeroes the state; indeed at that point, the residual norm falls to below","element":"span"},{"style":{"height":7.6},"width":16,"height":19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-9.png","element":"img","alt":"1","inline":true},{"style":{"height":8.79},"width":61.92,"height":21.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-10.png","element":"img","alt":"CT 2","inline":true,"padRight":true},{"text":"so that the overall overhead coming from this residual is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"o","element":"span"},{"text":"(1)","element":"span"},{"text":". This is a consequence of the fact that ","element":"span"},{"text":"for the noiseless model, the steady state ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"is zero. By Lemma ","element":"span"},{"href":"#id-59","text":"3.2","element":"a"}],[{"style":{"width":"27%"},"width":489,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-11.png","element":"img"}],[{"text":"so that","element":"span"}],[{"style":{"width":"29%"},"width":525,"height":234,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-12.png","element":"img"}],[{"text":"Moreover the cost of this near-reset is at most","element":"span"}],[{"style":{"width":"75%"},"width":1356,"height":230,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-13.png","element":"img"}],[{"text":"The following Theroem shows how resetting can be done in the general case using Lemma ","element":"span"},{"href":"#id-62","text":"B.4, ","element":"a"},{"text":"or Lemma ","element":"span"},{"href":"#id-63","text":"C.1 ","element":"a"},{"text":"to bound the cost of driving a state to zero. It follows that under either the assumption of strong controllability, or the existence of a strongly stable policy, we can derive a bound on the cost ","element":"span"},{"style":{"height":13.59},"width":92.29,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-14.png","element":"img","alt":" Cr of","inline":true,"padRight":true},{"text":"the reset step in FLL.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Theorem C.2 ","element":"span"},{"text":"(Resetting)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"Consider the noisy system ","element":"span"},{"style":{"height":15.59},"width":415.93,"height":38.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-15.png","element":"img","alt":" xt+1 = Axt + But + wt","inline":true,"padRight":true},{"text":"where ","element":"span"},{"style":{"height":16},"width":250.67,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-16.png","element":"img","alt":" wt ∼ N(0, W)","inline":true},{"text":". Suppose that","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"text":"The noiseless system ","element":"span"},{"style":{"height":15.59},"width":314.71,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-17.png","element":"img","alt":" xt+1 = Axt + But","inline":true,"padRight":true},{"text":"starting from ","element":"span"},{"style":{"height":9.19},"width":38.78,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-18.png","element":"img","alt":" x0","inline":true,"padRight":true},{"text":"can be driven to zero in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"steps at cost ","element":"span"},{"style":{"height":17.38},"width":218.1,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-19.png","element":"img","alt":"C∥x0∥2, and","inline":true}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"text":"for some strategy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K","element":"span"},{"text":", the noisy system starting at state ","element":"span"},{"text":"0 ","element":"span"},{"text":"has steady state cost ","element":"span"},{"style":{"height":13.19},"width":58.51,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-20.png","element":"img","alt":" Css","inline":true,"padRight":true},{"text":"and steady state covariance ","element":"span"},{"style":{"height":13.19},"width":77.92,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-21.png","element":"img","alt":" Σxx.","inline":true}],[{"text":"Then, given initial state ","element":"span"},{"style":{"height":9.19},"width":38.78,"height":22.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-22.png","element":"img","alt":" x0","inline":true},{"text":", there is a sequence of actions ","element":"span"},{"style":{"height":10},"width":210.58,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-23.png","element":"img","alt":" u0, . . . , uk−1","inline":true,"padRight":true},{"text":"such that state ","element":"span"},{"style":{"height":18.3},"width":254.85,"height":45.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-24.png","element":"img","alt":" E[xkxTk] ⪯ Σxx","inline":true,"padRight":true},{"text":"and the cost of the the first ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"steps is at most ","element":"span"},{"style":{"height":17.38},"width":271.98,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-25.png","element":"img","alt":" kCss + C∥x0∥2.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"The idea is to use the linearity of the transition function, and split the sequence of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"states into two sequences: one that starts at ","element":"span"},{"style":{"height":9.19},"width":38.78,"height":22.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-26.png","element":"img","alt":" x0","inline":true,"padRight":true},{"text":"and the other at ","element":"span"},{"style":{"height":14},"width":110.86,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-27.png","element":"img","alt":" y0 = 0","inline":true},{"text":". We will play ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"on the sequence ","element":"span"},{"style":{"height":10},"width":210.67,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-28.png","element":"img","alt":" y0, · · · , yk−1","inline":true},{"text":", and simultaneously drive the sequence ","element":"span"},{"style":{"height":14},"width":605.36,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-29.png","element":"img","alt":" x0, · · · , xk−1 to 0. Let u0, . . . , uk−1","inline":true,"padRight":true},{"text":"be the set of actions that drive the noiseless system starting at ","element":"span"},{"style":{"height":9.19},"width":38.78,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-30.png","element":"img","alt":" x0","inline":true,"padRight":true},{"text":"to zero. At time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", we will play the control vector ","element":"span"},{"style":{"height":14.4},"width":335.91,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-31.png","element":"img","alt":" ut + Kyt where the","inline":true,"padRight":true},{"text":"actual state of the system is ","element":"span"},{"style":{"height":12.4},"width":118.16,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-32.png","element":"img","alt":" xt + yt","inline":true},{"text":". Thus we obtain ","element":"span"},{"style":{"height":15.59},"width":315.67,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-33.png","element":"img","alt":" xt+1 = Axt + But","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16},"width":433.04,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-34.png","element":"img","alt":" yt+1 = (A + BK)yt + wt","inline":true},{"text":", and indeed","element":"span"}],[{"style":{"width":"44%"},"width":813,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-35.png","element":"img"}],[{"text":"After ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"rounds we will have ","element":"span"},{"style":{"height":12.79},"width":114.88,"height":31.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-36.png","element":"img","alt":" xk = 0","inline":true},{"text":", and the system would be at state ","element":"span"},{"style":{"height":10},"width":36.54,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-37.png","element":"img","alt":" yk","inline":true},{"text":". A simple induction proof along the lines of ","element":"span"},{"href":"#id-49","text":"Lemma 6.5 ","element":"a"},{"text":"implies that ","element":"span"},{"style":{"height":18.3},"width":245.44,"height":45.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-38.png","element":"img","alt":" E[xkxTk] ⪯ Σss","inline":true},{"text":". Finally, the sequences ","element":"span"},{"style":{"height":14.4},"width":505.09,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-39.png","element":"img","alt":" x0, . . . , xd−1 and y0, . . . , yd−1","inline":true,"padRight":true},{"text":"are statistically-independent and ","element":"span"},{"style":{"height":18.67},"width":121.78,"height":46.67,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1806.07104/images/21-40.png","element":"img","alt":" (yt)d−1t=0 ","inline":true,"padRight":true},{"text":"has mean-zero. As the cost is a quadratic function of the state, ","element":"span"},{"text":"the total expected cost of the reset is the sum of the expected costs of the two sequences individually.","element":"span"}]]}],"_version":"3.3.4"},"paperNode":"$1b:props:children:props:children:0:props:product"}]]]}]}]