36:[["$","audio",null,{"id":"tts"}],["$","$L3b",null,{"paperID":"1801.09627","publisher":"arxiv","paperJSON":{"title":"Barrier-Certified Adaptive Reinforcement Learning with Applications to Brushbot Navigation","paperID":"1801.09627","avgLineHeight":11.95,"imgScale":4,"sections":[{"heading":"Abstract","paragraphs":[[{"style":{"fontWeight":"bold"},"text":"$3c","element":"span"},{"style":{"fontStyle":"italic","fontWeight":"bold"},"text":"stationarity ","element":"span"},{"style":{"fontWeight":"bold"},"text":"assumptions in the safe learnings literature, and is then tested on a real robot, the brushbot, whose dynamics is unknown, highly complex and nonstationary.","element":"span"}],[{"style":{"fontStyle":"italic","fontWeight":"bold"},"text":"Index Terms","element":"span"},{"style":{"fontWeight":"bold"},"text":"— Safe learning, control barrier certificate, sparse optimization, kernel adaptive filter, brushbot","element":"span"}]]},{"heading":"I. INTRODUCTION","paragraphs":[[{"text":"By exploring and interacting with an environment, reinforcement learning can determine the optimal policy with respect to the long-term rewards given to an agent [1], [2]. Whereas the idea of determining the optimal policy in terms of a cost over some time horizon is standard in the controls literature [3], reinforcement learning is aimed at learning the long-term rewards by exploring the states and actions. As such, the agent dynamics is no longer explicitly taken into account, but rather is subsumed by the data.","element":"span"}],[{"text":"If no information about the agent dynamics is available, however, an agent might end up in certain regions of the state space that must be avoided while exploring. Avoiding such","element":"span"}],[{"text":"This work was sponsored in part by the U.S. National Science Foundation under Grant No. 1531195. The work of M. Ohnishi was supported in part by the Scandinavia-Japan Sasakawa Foundation under Grant GA17-JPN-0002 and the Travel Grant of the School of Electrical Engineering, Royal Institute of Technology.","element":"span"}],[{"text":"M. Ohnishi is with the School of Electrical Engineering, Royal Institute of Technology, 11428 Stockholm, Sweden, the Georgia Robotics and Intelligent Systems Laboratory, Georgia Institute of Technology, Atlanta, GA 30332 USA, and also with the RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan (e-mail: motoya@kth.se).","element":"span"}],[{"text":"L. Wang and M. Egerstedt are with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA (e-mail: liwang@gatech.edu; magnus@gatech.edu).","element":"span"}],[{"text":"G. ","element":"span"},{"text":"Notomista ","element":"span"},{"text":"is ","element":"span"},{"text":"with ","element":"span"},{"text":"the ","element":"span"},{"text":"School ","element":"span"},{"text":"of ","element":"span"},{"text":"Mechanical ","element":"span"},{"text":"Engineering, Georgia ","element":"span"},{"text":"Institute ","element":"span"},{"text":"of ","element":"span"},{"text":"Technology, ","element":"span"},{"text":"Atlanta, ","element":"span"},{"text":"GA ","element":"span"},{"text":"30313 ","element":"span"},{"text":"USA ","element":"span"},{"text":"(e-mail: g.notomista@gatech.edu).","element":"span"}],[{"text":"regions of the state space is referred to as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"safety","element":"span"},{"text":". Safety includes collision avoidance, boundary-transgression avoidance, connectivity maintenance in teams of mobile robots, and other mandatory constraints, and this tension between exploration and safety becomes particularly pronounced in robotics, where safety is crucial.","element":"span"}],[{"text":"In this paper, we address this safety issue, by employing model learning in combination with barrier certificates. In particular, we focus on learning for systems with discrete-time nonstationary (or time-varying) agent dynamics. Nonstationarity comes, for example, from failures of actuators, battery degradations, or sudden environmental disturbances. The result is a method that adapts to nonstationary agent dynamics and, under certain conditions, ensures recovery of safety in the sense of Lyapunov stability even after violations of safety due to the nonstationarity occur. We also propose discrete-time barrier certificates that guarantee global optimality of solutions to the barrier-certified policy optimization, and we use the learned model for barrier certificates.","element":"span"}],[{"text":"Over the last decade, the safety issue has been addressed under the name of safe learning, and plenty of solutions have been proposed [4]–[13]. To ensure safety while exploring, an initial knowledge of the agent dynamics, initial safe policy or a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"teacher ","element":"span"},{"text":"advising the agent is necessary [4], [14]. To obtain a model of the agent dynamics, human operators may maneuver the agent and record its trajectories [12], [15]. It is also possible that an agent continues exploring without entering the states with low long-term ","element":"span"},{"style":{"fontStyle":"italic"},"text":"risks ","element":"span"},{"text":"(e.g., [11], [16]). Due to the inherent uncertainty, the worst case scenario (e.g., possible lowest rewards) is typically taken into account [13], [17] and the set of safe policies can be expanded by exploring the states [4], [5]. To address the issue of this uncertainty for nonlinear-model estimation tasks, Gaussian process regression [18] is a strong tool, and many safe learning studies have taken advantage of its property (e.g., [4], [6], [7], [10], [13]).","element":"span"}],[{"text":"Nevertheless, when the agent dynamics is nonstationary and the long-term rewards vary accordingly, the assumptions often made in the safe learnings literature no longer hold, and violations of safety become inevitable. In such cases, we wish to ensure that the agent is at least successfully brought back to the set of safe states and the negative effect of an unexpected violation of safety is mitigated. Moreover, the long-term rewards must also be learned in an adaptive manner. These are the core motivations of this paper.","element":"span"}],[{"text":"To constrain the states within a desired safe region while exploring, we employ control barrier functions (cf. [19]–[24]). When the exact model of the agent dynamics is available, control barrier certificates ensure that an agent remains in the set of safe states for all time by constraining the instantaneous control input at each time. Also, an agent outside of the set of safe states is forced back to safety (Proposition ","element":"span"},{"href":"#id-0","text":"III.1)","element":"a"},{"text":". A useful property of control barrier certificates is that they modify polices only when violations of safety are truly imminent [22].","element":"span"}],[{"text":"If no nominal model (or simulation) of the possibly nonstationary agent dynamics is available, on the other hand, violations of safety are inevitable. Therefore, we wish to adaptively learn the agent dynamics, and eventually bring the agent back to safety. To this end, we propose a learning framework for a possible nonstationary agent dynamics, which recovers safety in the sense of Lyapunov stability under some conditions. This learning framework ties adaptive algorithms with control barrier certificates by focusing on set-theoretical aspects and monotonicity (or non-expansivity). By augmenting the state with the estimate of agent dynamics, Lyapunov stability with respect to the set of augmented safe states is guaranteed (Theorem ","element":"span"},{"href":"#id-1","text":"IV.1)","element":"a"},{"text":". Also, to efficiently enforce control barrier certificates, we employ adaptive sparse optimization techniques to extract dynamic structures (e.g., control-affine dynamics) by identifying truly ","element":"span"},{"style":{"fontStyle":"italic"},"text":"active ","element":"span"},{"text":"structural components (see Section ","element":"span"},{"href":"#id-2","text":"III-C ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-3","text":"IV-B)","element":"a"},{"text":".","element":"span"}],[{"text":"In addition, the long-term rewards need to be adaptively estimated when the agent dynamics is nonstationary. To this end, we reformulate the action-value function approximation problem so that, even if the action-value function varies, it can be adaptively estimated in the same functional space by employing an adaptive supervised learning algorithm in the space. Consequently, resetting the learning whenever the agent dynamics varies becomes unnecessary. Moreover, we present a barrier-certified policy update strategy by employing control barrier functions to effectively constrain policies. Because the global optimality of solutions to the constrained policy optimization is necessary to ensure the greedy improvement of a policy, we propose a discrete-time control barrier certificate that ensures the global optimality under some mild conditions (see Section ","element":"span"},{"href":"#id-4","text":"IV-C ","element":"a"},{"text":"and Theorem ","element":"span"},{"href":"#id-5","text":"IV.4 ","element":"a"},{"text":"therein). This is an improvement of the previously proposed discrete-time control barrier certificate [24].","element":"span"}],[{"text":"To validate and clarify our learning framework, we first conduct experiments of quadrotor simulations. Then, we conduct real-robotics experiments on a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"brushbot","element":"span"},{"text":", whose dynamics is unknown, highly complex and nonstationary, to test the efficacy of our framework in the real world (see Section ","element":"span"},{"href":"#id-6","text":"V)","element":"a"},{"text":". This is challenging due to many uncertainties and lack of simulators often used in applications of reinforcement learning in robotics (see [25] for example).","element":"span"}]]},{"heading":"II. PRELIMINARIES","paragraphs":[[{"text":"In this section, we present some of the related work and the system model considered in this paper. Throughout, ","element":"span"},{"style":{"height":14.81},"width":119.89,"height":37.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-0.png","element":"img","alt":" R, Z≥0","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":14.01},"width":64.48,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-1.png","element":"img","alt":" Z>0","inline":true,"padRight":true},{"text":"are the sets of real numbers, nonnegative integers and positive integers, respectively. Let ","element":"span"},{"style":{"height":16.57},"width":86.89,"height":41.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-2.png","element":"img","alt":" ∥·∥H","inline":true,"padRight":true},{"text":"be the norm induced by the inner product ","element":"span"},{"style":{"height":16.57},"width":104.46,"height":41.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-3.png","element":"img","alt":" ⟨·,·⟩H","inline":true,"padRight":true},{"text":"in an inner-product space ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":". In particular, define ","element":"span"},{"style":{"height":18.3},"width":231.59,"height":45.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-4.png","element":"img","alt":" ⟨x,y⟩RL := xT","inline":true},{"style":{"fontWeight":"bold"},"text":"y ","element":"span"},{"text":"for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L","element":"span"},{"text":"-dimensional real vectors ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"height":16.99},"width":131.5,"height":42.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-5.png","element":"img","alt":",y ∈ RL","inline":true},{"text":", and ","element":"span"},{"style":{"height":19.2},"width":325.64,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-6.png","element":"img","alt":" ∥x∥RL :=�⟨x,x⟩RL","inline":true},{"text":", where ","element":"span"},{"style":{"height":17.79},"width":61.96,"height":44.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-7.png","element":"img","alt":" (·)T","inline":true,"padRight":true},{"text":"stands for transposition. We define ","element":"span"},{"text":"[","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"y","element":"span"},{"text":"] ","element":"span"},{"text":"as ","element":"span"},{"style":{"height":17.79},"width":141.86,"height":44.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-8.png","element":"img","alt":" [xT,yT]T","inline":true},{"text":", and let ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":12.4},"width":184.28,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-9.png","element":"img","alt":" ∈ X ⊂ Rnx","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":12.4},"width":178.43,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-10.png","element":"img","alt":" ∈ U ⊂ Rnu","inline":true},{"text":", for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"style":{"height":14.01},"width":98.78,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-11.png","element":"img","alt":" ∈ Z>0","inline":true},{"text":", denote the state and the control input at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":14.81},"width":100.12,"height":37.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-12.png","element":"img","alt":" ∈ R≥0","inline":true},{"text":", respectively.","element":"span"}],[{"id":"id-56","style":{"fontStyle":"italic"},"text":"A. Related Work","element":"span"}],[{"text":"The primary focus of this paper is the safety issue ","element":"span"},{"style":{"fontStyle":"italic"},"text":"while exploring","element":"span"},{"text":". Typically, some initial knowledges, such as an initial safe policy and a model of the agent dynamics, are required to address the safety issue while exploring; therefore, model learning is often employed together. We introduce some related work on model learning and kernel-based action-value function approximation.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"1) Model Learning for Safe Maneuver: ","element":"span"},{"text":"The recent work in","element":"span"}],[{"text":"[13], [7], and [4] assumes an initial conservative set of safe policies, which is gradually expanded as more data become available. These approaches are designed for stationary agent dynamics, and Gaussian processes (GPs) are employed to obtain the confidence interval of the model. To ensure safety, control barrier functions and control Lyapunov functions are employed in [13] and [4], respectively. On the other hand, the work in [10] uses a trajectory optimization based on the receding horizon control and model learning by GPs, which is computationally expensive when the model is highly nonlinear.","element":"span"}],[{"text":"In this paper, we aim at tying adaptive model learning algorithms and control barrier certificates by focusing on set-theoretical aspects and monotonicity (or non-expansivity). Hence, we employ an adaptive filter with monotone approximation property, which shares similar ideas with stable online learning for adaptive control based on Lyapunov stability (c.f. [26]–[29], for example).","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"2) Learning Dynamic Structures in Reproducing Kernel","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Hilbert Spaces: ","element":"span"},{"text":"An approach that learns dynamics in reproducing kernel Hilbert spaces (RKHSs) so that the resulting model satisfies the Euler-Lagrange equation was proposed in [30], while our paper proposes a learning framework that adaptively captures control-affine structure in RKHSs to efficiently enforce control barrier certificates.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"3) Reinforcement Learning in Reproducing Kernel Hilbert","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Spaces: ","element":"span"},{"text":"We introduce, briefly, ideas of existing action-value function approximation techniques. Given a policy ","element":"span"},{"style":{"height":15.2},"width":153.38,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-13.png","element":"img","alt":" φ : X →","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":", the action-value function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":10.8},"width":17,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-14.png","element":"img","alt":"φ","inline":true,"padRight":true},{"text":"associated with the policy ","element":"span"},{"style":{"height":14.8},"width":23,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-15.png","element":"img","alt":" φ","inline":true,"padRight":true},{"text":"is defined as","element":"span"}],[{"id":"id-34","style":{"width":"91%"},"width":922,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-16.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":16},"width":147.62,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-17.png","element":"img","alt":" γ ∈ (0,1)","inline":true,"padRight":true},{"text":"is the discount factor, ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":18.18},"width":100.08,"height":45.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-18.png","element":"img","alt":")n∈Z≥0","inline":true,"padRight":true},{"text":"is a trajectory of the agent starting from ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":"0 ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":", and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"R","element":"span"},{"style":{"height":16},"width":169.6,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-19.png","element":"img","alt":"(x,u) ∈ R","inline":true,"padRight":true},{"text":"is the immediate reward. It is known that the action-value function follows the Bellman equation (c.f. [2, Equation (66)]):","element":"span"}],[{"id":"id-16","style":{"width":"91%"},"width":922,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/1-20.png","element":"img"}],[{"text":"For robotics applications, where the states and controls are continuous, some form of function approximators is required to approximate the action-value function (and/or policies). Nonparametric learning such as a kernel method is often desirable when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"a priori ","element":"span"},{"text":"knowledge about a suitable set of basis functions for learning is unavailable. Kernel-based reinforcement learning has been studied in the literature, e.g., [31], [31]–[44]. Due to the property of reproducing kernels, the framework of linear learning algorithms is directly applied to nonlinear function estimation tasks in a possibly infinite-dimensional functional space, namely a reproducing kernel Hilbert space.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition II.1 ","element":"span"},{"text":"( [45, page 343])","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"Given a nonempty set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"which is a Hilbert space defined in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"text":", the function ","element":"span"},{"style":{"height":16},"width":123.01,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-0.png","element":"img","alt":"κ (z,w)","inline":true,"padRight":true},{"text":"of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"z ","element":"span"},{"text":"is called a reproducing kernel of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"if","element":"span"}],[{"text":"1) for every ","element":"span"},{"style":{"fontWeight":"bold"},"text":"w ","element":"span"},{"style":{"height":16},"width":218.6,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-1.png","element":"img","alt":" ∈ Z , κ (z,w)","inline":true,"padRight":true},{"text":"as a function of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"z ","element":"span"},{"style":{"height":12.4},"width":75.5,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-2.png","element":"img","alt":" ∈ Z","inline":true,"padRight":true},{"text":"belongs to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":", and","element":"span"}],[{"text":"2) it has the reproducing property, i.e., the following holds for every ","element":"span"},{"style":{"fontWeight":"bold"},"text":"w ","element":"span"},{"style":{"height":12.4},"width":77.4,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-3.png","element":"img","alt":" ∈ Z","inline":true,"padRight":true},{"text":"and every ","element":"span"},{"style":{"height":15.2},"width":121.59,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-4.png","element":"img","alt":" ϕ ∈ H","inline":true,"padRight":true},{"text":":","element":"span"}],[{"style":{"width":"37%"},"width":375,"height":43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-5.png","element":"img"}],[{"text":"If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"has a reproducing kernel, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"is called a Reproducing Kernel Hilbert Space (RKHS).","element":"span"}],[{"text":"One of the examples of kernels is the Gaussian kernel","element":"span"}],[{"style":{"width":"99%"},"width":998,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-6.png","element":"img"}],[{"text":"It is well-known that the Gaussian reproducing kernel Hilbert space has universality [46], i.e., any continuous function on every compact subset of ","element":"span"},{"style":{"height":13.39},"width":46.78,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-7.png","element":"img","alt":" RL","inline":true,"padRight":true},{"text":"can be approximated with an arbitrary accuracy. Another widely used kernel is the polynomial kernel ","element":"span"},{"style":{"height":18.19},"width":598.99,"height":45.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-8.png","element":"img","alt":" κ(x,y) := (xTy+c)d, c ≥ 0,d ∈ Z>0","inline":true},{"text":".","element":"span"}],[{"text":"In contrast to these existing approaches, we explicitly define a so-called reproducing kernel Hilbert space (RKHS) so that adaptive supervised learning of action-value functions can be conducted in the same space without having to reset the learning. Consequently, we can also conduct an action-value function approximation in the same RKHS even after the agent dynamics changes or policies are updated (See the remark below Theorem ","element":"span"},{"href":"#id-7","text":"IV.3 ","element":"a"},{"text":"and Section ","element":"span"},{"href":"#id-8","text":"V-A.2)","element":"a"},{"text":". The GP SARSA can also be reproduced by employing a GP in the explicitly defined RKHS as is discussed in Appendix ","element":"span"},{"text":"I. ","element":"span"},{"text":"Specifically, in this paper, a possibly nonstationary agent dynamics is considered as detailed below.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"B. System Model","element":"span"}],[{"text":"In this paper, we consider the following discrete-time deterministic nonlinear model of the nonstationary agent dynamics,","element":"span"}],[{"style":{"width":"92%"},"width":924,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-9.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":": ","element":"span"},{"style":{"height":12},"width":256.98,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-10.png","element":"img","alt":" X × U → Rnx","inline":true},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":": ","element":"span"},{"style":{"height":12},"width":168.12,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-11.png","element":"img","alt":" X → Rnx","inline":true},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":": ","element":"span"},{"style":{"height":12},"width":218.46,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-12.png","element":"img","alt":" X → Rnx×nu","inline":true,"padRight":true},{"text":"are continuous. Hereafter, we regard ","element":"span"},{"style":{"height":11.6},"width":133.84,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-13.png","element":"img","alt":" X × U","inline":true,"padRight":true},{"text":"as the same as ","element":"span"},{"style":{"height":13.39},"width":195.71,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-14.png","element":"img","alt":" Z ⊂ Rnx+nu","inline":true,"padRight":true},{"text":"under the one-to-one correspondence between ","element":"span"},{"style":{"fontWeight":"bold"},"text":"z ","element":"span"},{"text":":","element":"span"},{"text":"= [","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"height":16},"width":97.29,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-15.png","element":"img","alt":"] ∈ Z","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16},"width":261.73,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-16.png","element":"img","alt":" (x,u) ∈ X ×U","inline":true,"padRight":true},{"text":"if there is no confusion.","element":"span"}],[{"text":"We consider an agent with dynamics given in ","element":"span"},{"href":"#id-9","text":"(II.3)","element":"a"},{"text":", and the goal is to find an optimal policy which drives the agent to a desirable state ","element":"span"},{"style":{"fontStyle":"italic"},"text":"while remaining in the set of safe states (or the safe set) ","element":"span"},{"style":{"height":12.4},"width":127.91,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-17.png","element":"img","alt":" C ⊂ X","inline":true,"padRight":true},{"text":"defined as","element":"span"}],[{"style":{"width":"91%"},"width":923,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-18.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":": ","element":"span"},{"style":{"height":12},"width":132.05,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-19.png","element":"img","alt":" X → R","inline":true},{"text":". An optimal policy is a policy ","element":"span"},{"style":{"height":14.8},"width":23,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-20.png","element":"img","alt":" φ","inline":true,"padRight":true},{"text":"that attains an optimal value ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":17.79},"width":163.58,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-21.png","element":"img","alt":"φ(x,φ(x))","inline":true,"padRight":true},{"text":"for every state ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"style":{"height":12.4},"width":80.29,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-22.png","element":"img","alt":" ∈ X","inline":true,"padRight":true},{"text":". Note that the value associated with a policy varies when the dynamics is nonstationary, and that a quadruple ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":95.46,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-23.png","element":"img","alt":"+1,R(","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")) ","element":"span"},{"text":"is available at each time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":".","element":"span"}],[{"text":"With these preliminaries in place, we can present our safe learning framework.","element":"span"}]]},{"heading":"III. SAFE LEARNING FRAMEWORK","paragraphs":[[{"text":"Under possibly nonstationary dynamics, our safe learning framework adaptively estimates the long-term rewards to update policies with safety constraints. Also, recovery of safety in the sense of Lyapunov stability during exploration is guaran- ","element":"span"},{"id":"id-9","text":"teed under certain conditions. Define ","element":"span"},{"style":{"height":15.2},"width":181.82,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-24.png","element":"img","alt":" ψ : Z → R","inline":true,"padRight":true},{"text":"as ","element":"span"},{"style":{"height":16},"width":168.44,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-25.png","element":"img","alt":" ψ(x,u) :=","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"text":")+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":")+","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"text":", and suppose that the estimator of ","element":"span"},{"style":{"height":12},"width":32,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-26.png","element":"img","alt":" ψ","inline":true,"padRight":true},{"text":"at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", denoted by ˆ","element":"span"},{"style":{"height":12},"width":42.34,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-27.png","element":"img","alt":"ψn","inline":true},{"text":", is approximated by the model parameter ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":14.2},"width":227.79,"height":35.49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-28.png","element":"img","alt":" ∈ Rr, r ∈ Z>0","inline":true,"padRight":true},{"text":"in the linear form as","element":"span"}],[{"style":{"width":"30%"},"width":306,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-29.png","element":"img"}],[{"text":"Here, ","element":"span"},{"style":{"fontWeight":"bold"},"text":"k","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":106.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-30.png","element":"img","alt":") ∈ Rr","inline":true,"padRight":true},{"text":"is the output of basis functions at ","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". If the model parameter is accurately estimated (or the exact agent dynamics is available), the safe set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"becomes forward invariant and asymptotically stable by enforcing control barrier certificates at each time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":".","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"A. Discrete-time Control Barrier Functions","element":"span"}],[{"text":"The idea of control barrier functions is similar to Lyapunov functions; they require no explicit computations of the forward reachable set while ensuring certain properties by constraining the instantaneous control input. Particularly, control barrier functions guarantee that an agent starting from the safe set remains safe (i.e., forward invariance), and that an agent outside of the safe set is forced back to safety (i.e., Lyapunov stability with respect to the safe set). To make barrier certificates compatible with model learning and reinforcement learning, we employ the discrete-time control barrier certificates.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition III.1 ","element":"span"},{"text":"( [24, Definition 4])","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"A map ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":": ","element":"span"},{"style":{"height":12},"width":136.54,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-31.png","element":"img","alt":" X → R","inline":true,"padRight":true},{"text":"is a discrete-time exponential control barrier function if there exists a control input ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":12.4},"width":74.4,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-32.png","element":"img","alt":" ∈ U","inline":true,"padRight":true},{"text":"such that","element":"span"}],[{"style":{"width":"92%"},"width":924,"height":86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-33.png","element":"img"}],[{"text":"Note that we intentionally removed the condition ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":"0","element":"span"},{"style":{"height":16},"width":54.36,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-34.png","element":"img","alt":") ≥","inline":true,"padRight":true},{"text":"0 originally presented in [24, Definition 4]. Then, the forward invariance and asymptotic stability with respect to the safe set are ensured by the following proposition.","element":"span"}],[{"id":"id-0","style":{"fontWeight":"bold"},"text":"Proposition III.1. ","element":"span"},{"text":"The set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"defined in ","element":"span"},{"href":"#id-10","text":"(II.4) ","element":"a"},{"text":"for a valid discrete-time exponential control barrier function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":": ","element":"span"},{"style":{"height":12},"width":134.8,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-35.png","element":"img","alt":" X → R","inline":true,"padRight":true},{"text":"is forward invariant when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":"0","element":"span"},{"style":{"height":16},"width":58.03,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-36.png","element":"img","alt":") ≥","inline":true,"padRight":true},{"text":"0, and is asymptotically stable when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":"0","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"< ","element":"span"},{"text":"0.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"See Appendix ","element":"span"},{"text":"A.","element":"span"}],[{"text":"Proposition ","element":"span"},{"href":"#id-0","text":"III.1 ","element":"a"},{"text":"implies that an agent remains in the safe set ","element":"span"},{"id":"id-10","text":"defined in ","element":"span"},{"href":"#id-10","text":"(II.4) ","element":"a"},{"text":"for all time if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":"0","element":"span"},{"style":{"height":16},"width":56.61,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/2-37.png","element":"img","alt":") ≥","inline":true,"padRight":true},{"text":"0 and the inequality ","element":"span"},{"text":"(III.1) ","element":"span"},{"text":"are satisfied, and the agent outside of the safe set is brought back to safety.","element":"span"}],[{"text":"The main motivations of using control barrier functions are given below:","element":"span"}],[{"style":{"width":"71%"},"width":720,"height":540,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-0.png","element":"img"}],[{"text":"Fig. III.1. ","element":"figcaption","subtype":"caption"},{"id":"id-11","text":"An illustration of the monotone approximation property. The ","element":"figcaption","subtype":"caption"},{"text":"estimate ","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"h","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"n ","element":"figcaption","subtype":"caption"},{"text":"monotonically approaches to the set ","element":"figcaption","subtype":"caption"},{"style":{"height":8.8},"width":25,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-1.png","element":"img","alt":" Ω","inline":true,"padRight":true},{"text":"of optimal vectors ","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"h","element":"figcaption","subtype":"caption"},{"style":{"height":4.4},"width":12,"height":11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-2.png","element":"img","alt":"∗","inline":true,"padRight":true},{"text":"by sequentially minimizing the distance between ","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"h","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"n ","element":"figcaption","subtype":"caption"},{"text":"and ","element":"figcaption","subtype":"caption"},{"style":{"height":10.8},"width":226.93,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-3.png","element":"img","alt":" Ωn. Here, Ωn :=","inline":true,"padRight":true},{"text":"argmin","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"h","element":"figcaption","subtype":"caption"},{"style":{"height":13.24},"width":316.8,"height":33.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-4.png","element":"img","alt":"∈Rr Θn(h), where Θn(h)","inline":true,"padRight":true},{"text":"is the cost function at time instant ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"n","element":"figcaption","subtype":"caption"},{"text":".","element":"figcaption","subtype":"caption"}],[{"text":"a). Little modifications of policies: control barrier functions modify polices only when violations of safety are imminent. Consequently, an inaccurate or rough estimation of the model causes less negative effect on (model-free) reinforcement learning.","element":"span"}],[{"text":"b). Asymptotic stability of the safe set: the agent outside of the safe set is brought back to the safe set. In addition to Proposition ","element":"span"},{"href":"#id-0","text":"III.1, ","element":"a"},{"text":"this robustness property is analyzed in [19]. This property together with the adaptive model learning algorithm presented in the next subsection is particularly important when the safety is violated due to the nonstationarity of the agent dynamics. Under a possibly nonstationary agent dynamics, we can no longer guarantee that the current estimate of the model parameter is sufficiently accurate to enforce the inequality ","element":"span"},{"text":"(III.1) ","element":"span"},{"text":"or forward invariance of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":". Nevertheless, we are still able to show that safety is recovered in the sense of Lyapunov stability under certain conditions by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"adaptively ","element":"span"},{"text":"learning the model.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"B. Adaptive Model Learning Algorithms with Monotone Approximation Property","element":"span"}],[{"text":"At each time instant, an input-output pair ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":67.89,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-5.png","element":"img","alt":",δn)","inline":true},{"text":", where ","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":":","element":"span"},{"text":"= [","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"] ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":14.39},"width":93.76,"height":35.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-6.png","element":"img","alt":" δn :=","inline":true,"padRight":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":8.8},"width":80.45,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-7.png","element":"img","alt":"+1 −","inline":true,"padRight":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"for model learning is available. Under possibly nonstationary agent dynamics, it is vital for the model parameter estimation to be stable even after the agent dynamics changes. In this paper, we employ an adaptive algorithm with monotone approximation property. Note this approach shares a similar idea with stable online learning based on Lyapunov-like conditions.","element":"span"}],[{"text":"Suppose that the estimate of model parameter at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"is given by ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":14.19},"width":229.36,"height":35.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-8.png","element":"img","alt":" ∈ Rr, r ∈ Z>0","inline":true},{"text":". Given a cost function ","element":"span"},{"style":{"height":16},"width":99.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-9.png","element":"img","alt":" Θn(h)","inline":true,"padRight":true},{"text":"at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", we update the parameter ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"so as to satisfy the strictly monotone approximation property ","element":"span"},{"style":{"height":16},"width":20,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-10.png","element":"img","alt":" ∥","inline":true},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16.72},"width":213.45,"height":41.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-11.png","element":"img","alt":"+1 −h∗n∥Rr <","inline":true},{"style":{"height":16},"width":20,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-12.png","element":"img","alt":"∥","inline":true},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":16.72},"width":325.24,"height":41.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-13.png","element":"img","alt":" −h∗n∥Rr , ∀h∗n ∈ Ωn","inline":true,"padRight":true},{"text":":","element":"span"},{"text":"= ","element":"span"},{"text":"argmin","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"height":16.84},"width":159.08,"height":42.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-14.png","element":"img","alt":"∈Rr Θn(h)","inline":true,"padRight":true},{"text":"if ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":16},"width":132.7,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-15.png","element":"img","alt":"/∈ Ωn ̸=","inline":true,"padRight":true},{"text":"/0, where /0 is the empty set. Then, if ","element":"span"},{"style":{"height":18.52},"width":267.32,"height":46.3,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-16.png","element":"img","alt":" Ω := �n∈Z≥0 Ωn","inline":true,"padRight":true},{"text":"is ","element":"span"},{"text":"nonempty and if ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":16},"width":84.34,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-17.png","element":"img","alt":" /∈ Ωn","inline":true},{"text":", it follows that ","element":"span"},{"style":{"height":16},"width":20,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-18.png","element":"img","alt":" ∥","inline":true},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16.58},"width":218.82,"height":41.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-19.png","element":"img","alt":"+1 −h∗∥Rr <","inline":true},{"style":{"height":16},"width":20,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-20.png","element":"img","alt":"∥","inline":true},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":16.57},"width":455.45,"height":41.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-21.png","element":"img","alt":" −h∗∥Rr , ∀h∗ ∈ Ω, n ∈ Z≥0","inline":true},{"text":". This is illustrated in Figure ","element":"span"},{"href":"#id-11","text":"III.1. ","element":"a"},{"text":"Under mild conditions, we can also design algorithms (e.g., the adaptive projected subgradient method [47]) that satisfy ","element":"span"},{"style":{"height":16},"width":20,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-22.png","element":"img","alt":" ∥","inline":true},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":19.59},"width":201.18,"height":48.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-23.png","element":"img","alt":" −h∗n∥2Rr − ∥","inline":true},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":19.59},"width":277.54,"height":48.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-24.png","element":"img","alt":"+1 −h∗n∥2Rr ≥ ρ23","inline":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":78.81,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-25.png","element":"img","alt":",Ωn)","inline":true},{"text":", for","element":"span"}],[{"style":{"width":"61%"},"width":621,"height":290,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-26.png","element":"img"}],[{"text":"Fig. III.2. ","element":"span"},{"id":"id-12","text":"An illustration of Lyapunov stability of the system for the ","element":"span"},{"text":"augmented state ","element":"span"},{"style":{"height":13.43},"width":181.51,"height":33.57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-27.png","element":"img","alt":" [x;h] ∈ Rnx+r ","inline":true,"padRight":true},{"text":"with respect to the forward invariant set ","element":"span"},{"style":{"height":11.03},"width":207.94,"height":27.57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-28.png","element":"img","alt":"C ×Ω ⊂ Rnx+r.","inline":true}],[{"text":"all ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"height":15.7},"width":115.54,"height":39.26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-29.png","element":"img","alt":"∗n ∈ Ωn","inline":true},{"text":", and for some ","element":"span"},{"style":{"height":12.8},"width":82.94,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-30.png","element":"img","alt":" ρ3 >","inline":true,"padRight":true},{"text":"0, where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":133.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-31.png","element":"img","alt":",Ωn) :=","inline":true,"padRight":true},{"text":"inf","element":"span"},{"style":{"height":16.06},"width":39.93,"height":40.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-32.png","element":"img","alt":"{∥","inline":true},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":16.72},"width":298.06,"height":41.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-33.png","element":"img","alt":" −h∗n∥Rr |h∗n ∈ Ωn}","inline":true},{"text":". (See [47] for more detailed argu- ","element":"span"},{"text":"ments for example.)","element":"span"}],[{"text":"At each time instant, we use the current estimate of the model to constrain control inputs so that they satisfy","element":"span"}],[{"style":{"width":"91%"},"width":923,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-34.png","element":"img"}],[{"text":"for some margin ","element":"span"},{"style":{"height":12.8},"width":79.66,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-35.png","element":"img","alt":" ρ1 >","inline":true,"padRight":true},{"text":"0, where ˆ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":8.8},"width":37.92,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-36.png","element":"img","alt":"+1","inline":true,"padRight":true},{"text":"is the predicted output of the current estimate ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"at ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". Then, under certain conditions, we can guarantee Lyapunov stability of the system for the augmented state ","element":"span"},{"text":"[","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16.59},"width":150.74,"height":41.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-37.png","element":"img","alt":"] ∈ Rnx+r","inline":true,"padRight":true},{"text":"with respect to the forward invariant set ","element":"span"},{"style":{"height":13.39},"width":250.28,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-38.png","element":"img","alt":" C × Ω ⊂ Rnx+r","inline":true,"padRight":true},{"text":"as illustrated in Figure ","element":"span"},{"href":"#id-12","text":"III.2. ","element":"a"},{"text":"In Sections ","element":"span"},{"href":"#id-13","text":"IV-A ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-6","text":"V, ","element":"a"},{"text":"we will theoretically and experimentally show that the system for the augmented state is stable on the set of augmented safe states.","element":"span"}],[{"text":"To efficiently constrain policies by using control barrier functions, the learned model is preferred to be affine in control. (see Section ","element":"span"},{"href":"#id-4","text":"IV-C ","element":"a"},{"text":"and Theorem ","element":"span"},{"href":"#id-5","text":"IV.4 ","element":"a"},{"text":"therein.) As such, outputs of the learned model should have preferred dynamic structures while capturing the true agent dynamics.","element":"span"}],[{"id":"id-2","style":{"fontStyle":"italic"},"text":"C. Leaning Dynamic Structure via Sparse Optimizations","element":"span"}],[{"text":"Control-affine dynamics is given by ","element":"span"},{"href":"#id-9","text":"(II.3) ","element":"a"},{"text":"with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"= ","element":"span"},{"text":"0, where 0 denotes the null function. Therefore, the simplest way is to learn the agent dynamics with the constraint ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"= ","element":"span"},{"text":"0. In practice, however, it is unrealistic to assume that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"= ","element":"span"},{"text":"0 due to the effects of frictions and other disturbances. Instead, as long as the term ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"is negligibly small, we can consider ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"to be a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"system noise ","element":"span"},{"text":"added to a control-affine dynamics. To encourage the term ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"to be as small as possible while capturing the true input-output relations of the agent dynamics, we use adaptive sparse optimization techniques. In particular, motivated by the monotone approximation property due to convexity of the formulations, we use (sparse) kernel adaptive filters for the systems with nonlinear dynamics. Specifically, we take the following steps to extract the control-affine structure:","element":"span"}],[{"text":"1) Assume ","element":"span"},{"text":"for ","element":"span"},{"text":"simplicity ","element":"span"},{"text":"that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"= ","element":"span"},{"text":"1. ","element":"span"},{"text":"We ","element":"span"},{"text":"suppose that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"style":{"height":16.39},"width":100.52,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-39.png","element":"img","alt":" ∈ Hp","inline":true},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"style":{"height":16.44},"width":102.74,"height":41.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-40.png","element":"img","alt":" ∈ Hf","inline":true,"padRight":true},{"text":", and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"height":19.38},"width":362.64,"height":48.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-41.png","element":"img","alt":"(1),g(2),...,g(nu) ∈ Hg","inline":true},{"text":", where ","element":"span"},{"style":{"height":16.44},"width":155.58,"height":41.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-42.png","element":"img","alt":" Hp, Hf","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16.39},"width":53.61,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-43.png","element":"img","alt":" Hg","inline":true,"padRight":true},{"text":"are RKHSs, and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":") = ","element":"span"},{"style":{"height":18.59},"width":464.39,"height":46.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-44.png","element":"img","alt":"[g(1)(x),g(2)(x),··· ,g(nu)(x)]","inline":true},{"text":".","element":"span"}],[{"text":"2) Let ","element":"span"},{"style":{"height":13.99},"width":53.62,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-45.png","element":"img","alt":" Hu","inline":true,"padRight":true},{"text":"be the RKHS associated with the reproducing kernel ","element":"span"},{"style":{"height":17.79},"width":436.06,"height":44.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-46.png","element":"img","alt":" κ (u,v) := uTv, u,v ∈ U","inline":true,"padRight":true},{"text":",and ","element":"span"},{"style":{"height":13.99},"width":51.62,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-47.png","element":"img","alt":" Hc","inline":true,"padRight":true},{"text":"the set of constant functions on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":". Estimate the function ","element":"span"},{"style":{"height":12},"width":32,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-48.png","element":"img","alt":" ψ","inline":true,"padRight":true},{"text":"in the RKHS ","element":"span"},{"style":{"height":16.79},"width":571.68,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/3-49.png","element":"img","alt":" Hψ := Hp +H f ⊗Hc +Hg ⊗Hu","inline":true,"padRight":true},{"text":"(see Section ","element":"span"},{"href":"#id-3","text":"IV-B ","element":"a"},{"text":"and Theorem ","element":"span"},{"href":"#id-14","text":"IV.2 ","element":"a"},{"text":"therein).","element":"span"}],[{"text":"3) Define the cost ","element":"span"},{"style":{"height":13.59},"width":44.53,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-0.png","element":"img","alt":" Θn","inline":true,"padRight":true},{"text":"so as to promote sparsity of the model parameter. If the underlying true dynamics is affine in control, a control-affine model (i.e., the estimate of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"denoted by ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"becomes null) is expected to be extracted. The resulting control-affine part of the estimated dynamics is used in combination with control barrier certificates in order to efficiently constrain policies while and after learning an optimal policy. (see Theorem ","element":"span"},{"href":"#id-1","text":"IV.1 ","element":"a"},{"text":"and Theorem ","element":"span"},{"href":"#id-5","text":"IV.4 ","element":"a"},{"text":"for more details.)","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"D. Barrier-certified Policy Update","element":"span"}],[{"text":"Lastly, we present the barrier-certified policy update strategy. To update policies, we use the long-term rewards that needs to be adaptively estimated for systems with possibly nonstationary agent dynamics.","element":"span"}],[{"id":"id-21","style":{"fontStyle":"italic"},"text":"1) Adaptive ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Action-value ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Approximation ","element":"span"},{"style":{"fontStyle":"italic"},"text":"in","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"RKHSs: ","element":"span"},{"text":"Again, motivated by the monotone approximation property (see Corollary ","element":"span"},{"href":"#id-15","text":"IV.1) ","element":"a"},{"text":"and the flexibility of nonparametric learning that requires no fixed set of basis functions, we employ kernel-based adaptive algorithms to estimate the action-value function. One of the issues arising when applying a kernel-based method to an action-value function approximation is that the output of the action-value function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":17.79},"width":36.66,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-1.png","element":"img","alt":"φ(","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16.44},"width":130.74,"height":41.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-2.png","element":"img","alt":") ∈ HQ","inline":true,"padRight":true},{"text":"associated with a policy ","element":"span"},{"style":{"height":14.8},"width":23,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-3.png","element":"img","alt":" φ","inline":true},{"text":", where ","element":"span"},{"style":{"height":16.04},"width":59.62,"height":40.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-4.png","element":"img","alt":" HQ","inline":true,"padRight":true},{"text":"is assumed to be an RKHS, is unobservable. Nevertheless, we know that the action-value function follows the Bellman equation ","element":"span"},{"href":"#id-16","text":"(II.2)","element":"a"},{"text":". Hence, by defining a function ","element":"span"},{"style":{"height":16.99},"width":51.64,"height":42.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-5.png","element":"img","alt":" ψQ","inline":true,"padRight":true},{"text":": ","element":"span"},{"style":{"height":13.79},"width":146.26,"height":34.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-6.png","element":"img","alt":" Z 2 → R","inline":true},{"text":", where ","element":"span"},{"style":{"height":15.39},"width":430.11,"height":38.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-7.png","element":"img","alt":" R2(nx+nu) ⊃ Z 2 = Z ×Z","inline":true,"padRight":true},{"text":", as","element":"span"}],[{"style":{"width":"92%"},"width":924,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-8.png","element":"img"}],[{"text":"the ","element":"span"},{"text":"Bellman ","element":"span"},{"text":"equation ","element":"span"},{"text":"in ","element":"span"},{"href":"#id-16","text":"(II.2) ","element":"a"},{"text":"is ","element":"span"},{"text":"solved ","element":"span"},{"text":"via ","element":"span"},{"text":"iterative nonlinear function estimation with the input-output pairs ","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"([","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":96.38,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-9.png","element":"img","alt":"+1;φ(","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":121.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-10.png","element":"img","alt":"+1)],R(","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":18.19},"width":135.47,"height":45.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-11.png","element":"img","alt":"))}n∈Z≥0","inline":true},{"text":". In fact, the function ","element":"span"},{"style":{"height":16.99},"width":51.64,"height":42.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-12.png","element":"img","alt":"ψQ","inline":true,"padRight":true},{"text":"is an element of a properly constructed RKHS ","element":"span"},{"style":{"height":18.75},"width":78.29,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-13.png","element":"img","alt":" HψQ","inline":true,"padRight":true},{"text":"(see Section ","element":"span"},{"href":"#id-4","text":"IV-C ","element":"a"},{"text":"and Theorem ","element":"span"},{"href":"#id-7","text":"IV.3 ","element":"a"},{"text":"therein). Because the domain of ","element":"span"},{"style":{"height":18.76},"width":78.29,"height":46.89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-14.png","element":"img","alt":" HψQ","inline":true,"padRight":true},{"text":"is defined as ","element":"span"},{"style":{"height":11.6},"width":125.96,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-15.png","element":"img","alt":" Z ×Z","inline":true,"padRight":true},{"text":"instead of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"text":", the RKHS ","element":"span"},{"style":{"height":18.76},"width":78.29,"height":46.89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-16.png","element":"img","alt":" HψQ","inline":true,"padRight":true},{"text":"does not depend on the agent dynamics. Therefore, we do not have to reset learning even after the dynamics changes or the policy is updated, and we can analyze convergence and/or monotone approximation property of an action-value function approximation in the same RKHS (see Section ","element":"span"},{"href":"#id-8","text":"V- ","element":"a"},{"href":"#id-8","text":"A.2, ","element":"a"},{"text":"for example).","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"2) Policy Update: ","element":"span"},{"text":"For a current policy ","element":"span"},{"style":{"height":15.2},"width":191.12,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-17.png","element":"img","alt":" φ : X → U","inline":true,"padRight":true},{"text":", assumethat the action-value function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":10.8},"width":17,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-18.png","element":"img","alt":"φ","inline":true,"padRight":true},{"text":"with respect to ","element":"span"},{"style":{"height":14.8},"width":23,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-19.png","element":"img","alt":" φ","inline":true,"padRight":true},{"text":"at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"is available. Given a discrete-time exponential control barrier function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"and 0 ","element":"span"},{"style":{"height":14},"width":112.26,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-20.png","element":"img","alt":" < η ≤","inline":true,"padRight":true},{"text":"1, the barrier certified safe control space is define as","element":"span"}],[{"style":{"width":"81%"},"width":816,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-21.png","element":"img"}],[{"text":"From Proposition ","element":"span"},{"href":"#id-0","text":"III.1, ","element":"a"},{"text":"the set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"defined in ","element":"span"},{"href":"#id-10","text":"(II.4) ","element":"a"},{"text":"is forward invariant and asymptotically stable if ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":16},"width":94.1,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-22.png","element":"img","alt":" ∈ S (","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":") ","element":"span"},{"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":9.6},"width":27,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-23.png","element":"img","alt":" ∈","inline":true},{"style":{"height":14.81},"width":64.48,"height":37.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-24.png","element":"img","alt":"Z≥0","inline":true},{"text":". Then, the updated policy ","element":"span"},{"style":{"height":16.19},"width":48.22,"height":40.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-25.png","element":"img","alt":" φ +","inline":true,"padRight":true},{"text":"given by","element":"span"}],[{"style":{"width":"91%"},"width":921,"height":83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-26.png","element":"img"}],[{"text":"is well-known (e.g., [48], [49]) to satisfy that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":17.79},"width":203.78,"height":44.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-27.png","element":"img","alt":"φ(x,φ(x)) ≤","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":19.67},"width":209.06,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-28.png","element":"img","alt":"φ+(x,φ +(x))","inline":true},{"text":", where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":12.68},"width":37.67,"height":31.7,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-29.png","element":"img","alt":"φ+","inline":true,"padRight":true},{"text":"is the action-value function with respect to ","element":"span"},{"style":{"height":16.19},"width":48.22,"height":40.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-30.png","element":"img","alt":" φ +","inline":true},{"text":". In practice, we use the estimate of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":10.8},"width":17,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-31.png","element":"img","alt":"φ","inline":true,"padRight":true},{"text":"because the exact function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":10.8},"width":17,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-32.png","element":"img","alt":"φ","inline":true,"padRight":true},{"text":"is unavailable. For example, the action-value function is estimated over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"style":{"height":14.01},"width":100.25,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-33.png","element":"img","alt":" ∈ Z>0","inline":true,"padRight":true},{"text":"iterations, and the policy is updated every ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"iterations.","element":"span"}]]},{"heading":"IV. ANALYSIS OF BARRIER-CERTIFIED ADAPTIVE REINFORCEMENT LEARNING","paragraphs":[[{"text":"In the previous section, we presented our barrier-certified adaptive reinforcement learning framework. In this section, we present theoretical analysis of our framework to further strengthen the arguments.","element":"span"}],[{"id":"id-13","style":{"fontStyle":"italic"},"text":"A. Safety Recovery: Adaptive Model Learning and Control ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Barrier Certificates","element":"span"}],[{"text":"The monotone approximation property of model parameters is closely related to Lyapunov stability. In fact, by augmenting the state vector with the model parameter, we can construct a Lyapunov function which guarantees stability with respect to the safe set under certain conditions.","element":"span"}],[{"text":"We first make following assumptions.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Assumption IV.1. ","element":"span"},{"text":"1) Finite-dimensional model parameter: the dimension of model parameter ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h ","element":"span"},{"text":"remains finite, and is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"r ","element":"span"},{"style":{"height":14.01},"width":99.87,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-34.png","element":"img","alt":" ∈ Z>0","inline":true},{"text":". 2) Boundedness of the basis functions: all of the basis functions (or kernel functions) are bounded over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":".","element":"span"}],[{"id":"id-22","text":"3) Lipschitz continuity of the control barrier function: the ","element":"span"},{"text":"control barrier function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"is Lipschitz continuous over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"with Lipschitz constant ","element":"span"},{"style":{"height":10.79},"width":39.76,"height":26.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-35.png","element":"img","alt":" νB","inline":true},{"text":".","element":"span"}],[{"text":"4) Validity of barrier certificates: there exists a control input ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":12.4},"width":74.4,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-36.png","element":"img","alt":" ∈ U","inline":true,"padRight":true},{"text":"satisfying for a sufficiently small ","element":"span"},{"style":{"height":12.8},"width":78.47,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-37.png","element":"img","alt":" ρ1 >","inline":true,"padRight":true},{"text":"0 that","element":"span"}],[{"id":"id-17","style":{"width":"82%"},"width":827,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-38.png","element":"img"}],[{"text":"where ˆ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":8.8},"width":37.92,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-39.png","element":"img","alt":"+1","inline":true,"padRight":true},{"text":"is the predicted output of the current estimate ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"at ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":".","element":"span"}],[{"text":"5) Appropriate ","element":"span"},{"text":"cost ","element":"span"},{"text":"functions: ","element":"span"},{"text":"if ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":13.59},"width":178.69,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-40.png","element":"img","alt":"∈ Ωn :=","inline":true,"padRight":true},{"text":"argmin","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"height":16.84},"width":159.08,"height":42.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-41.png","element":"img","alt":"∈Rr Θn(h)","inline":true},{"text":", where ","element":"span"},{"style":{"height":16},"width":99.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-42.png","element":"img","alt":" Θn(h)","inline":true,"padRight":true},{"text":"is the continuous cost function at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", then ","element":"span"},{"style":{"height":16},"width":20,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-43.png","element":"img","alt":" ∥","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":8.8},"width":76.18,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-44.png","element":"img","alt":"+1 −","inline":true,"padRight":true},{"text":"ˆ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":21.32},"width":194.31,"height":53.3,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-45.png","element":"img","alt":"+1∥Rnx ≤ ρ1νB","inline":true,"padRight":true},{"text":".","element":"span"}],[{"text":"6) Model learning with monotone approximation property: model parameter ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"is updated as ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":9.61},"width":78.78,"height":24.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-46.png","element":"img","alt":"+1 =","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"text":", where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":": ","element":"span"},{"style":{"height":11.39},"width":143.97,"height":28.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-47.png","element":"img","alt":" Rr → Rr","inline":true,"padRight":true},{"text":"is continuous and has monotone approximation property: if ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":16},"width":83.69,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-48.png","element":"img","alt":" /∈ Ωn","inline":true},{"text":", then ","element":"span"},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":18.04},"width":172.5,"height":45.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-49.png","element":"img","alt":",Ωn) ≥ ρ22","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16},"width":20,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-50.png","element":"img","alt":" ∥","inline":true},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":19.6},"width":191.23,"height":48.99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-51.png","element":"img","alt":" −h∗n∥2Rr −∥","inline":true},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":19.6},"width":263.24,"height":48.99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-52.png","element":"img","alt":"+1 −h∗n∥2Rr ≥ ρ23","inline":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":78.81,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-53.png","element":"img","alt":",Ωn)","inline":true},{"text":", for ","element":"span"},{"text":"all ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"height":15.7},"width":108.59,"height":39.26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-54.png","element":"img","alt":"∗n ∈ Ωn","inline":true},{"text":", and for some ","element":"span"},{"style":{"height":12.8},"width":133.55,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-55.png","element":"img","alt":" ρ2,ρ3 >","inline":true,"padRight":true},{"text":"0. If ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":13.59},"width":82,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-56.png","element":"img","alt":"∈ Ωn","inline":true},{"text":", then ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":9.61},"width":79.5,"height":24.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-57.png","element":"img","alt":"+1 =","inline":true,"padRight":true},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":".","element":"span"}],[{"text":"7) Data consistency: The set ","element":"span"},{"style":{"height":18.52},"width":257.7,"height":46.3,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-58.png","element":"img","alt":" Ω := �n∈Z≥0 Ωn","inline":true,"padRight":true},{"text":"is nonempty.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Remark ","element":"span"},{"text":"IV.1 (On Assumption ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"1)","element":"span"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"Assumption ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"1 is made so that Lyapunov stability can be analyzed in an Euclidean space and is reasonable if polynomial kernels are employed for learning or if the input space ","element":"span"},{"style":{"height":12},"width":232.32,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/4-59.png","element":"img","alt":" Z := X ×U","inline":true,"padRight":true},{"text":"is compact.","element":"span"}],[{"id":"id-24","style":{"fontStyle":"italic"},"text":"Remark ","element":"span"},{"text":"IV.2 (On Assumptions ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"2 and ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"3)","element":"span"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"Assumptions ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"2 and ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"3 ensure that the predicted value of the barrier function is close to its true value if the current estimate of model parameter is close to the true parameter.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Remark ","element":"span"},{"text":"IV.3 (On Assumption ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"4)","element":"span"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"Assumption ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"4 implies that we can enforce barrier certificates for the current estimate of the dynamics with a sufficiently small margin ","element":"span"},{"style":{"height":12.4},"width":36.88,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-0.png","element":"img","alt":" ρ1","inline":true},{"text":". This assumption is necessary to implicitly bound the growth of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":55.65,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-1.png","element":"img","alt":"+1)","inline":true,"padRight":true},{"text":"and to robustly enforce barrier certificates whenever ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":13.59},"width":80.96,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-2.png","element":"img","alt":" ∈ Ωn","inline":true},{"text":". Although this assumption is somewhat restrictive, it is still reasonable if the initial estimate does not largely deviate from the true dynamics.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Remark ","element":"span"},{"text":"IV.4 (On Assumption ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"5)","element":"span"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"Assumption ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"5 implies that the set ","element":"span"},{"style":{"height":13.59},"width":45.6,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-3.png","element":"img","alt":" Ωn","inline":true,"padRight":true},{"text":"or equivalently the cost ","element":"span"},{"style":{"height":13.59},"width":44.53,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-4.png","element":"img","alt":" Θn","inline":true,"padRight":true},{"text":"is designed so that the predicted output ˆ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":8.8},"width":37.91,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-5.png","element":"img","alt":"+1","inline":true,"padRight":true},{"text":"for ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":13.59},"width":82.56,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-6.png","element":"img","alt":" ∈ Ωn","inline":true,"padRight":true},{"text":"is sufficiently close to the true output ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":8.8},"width":37.91,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-7.png","element":"img","alt":"+1","inline":true},{"text":". Such a cost can be easily designed. This assumption is necessary to render the set ","element":"span"},{"style":{"height":11.6},"width":102.68,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-8.png","element":"img","alt":" C ×Ω","inline":true,"padRight":true},{"text":"forward invariant.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Remark ","element":"span"},{"text":"IV.5 (On Assumptions ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"6 and ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"7)","element":"span"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"To apply theories of Lyapunov stability, Assumption ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"6 is needed to make sure that the dynamical system for the augmented state is continuous. Moreover, the cost (or the set ","element":"span"},{"style":{"height":13.59},"width":45.6,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-9.png","element":"img","alt":" Ωn","inline":true},{"text":") is designed so that ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":13.59},"width":79.56,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-10.png","element":"img","alt":" ∈ Ωn","inline":true,"padRight":true},{"text":"or ","element":"span"},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":18.04},"width":164.24,"height":45.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-11.png","element":"img","alt":",Ωn) ≥ ρ22","inline":true},{"text":". See the work in [47] for ","element":"span"},{"text":"a class of algorithms that satisfy this property, for example. Unless there exist some adversarial data (or inappropriate costs) that do not reflect the true agent dynamics, Assumption ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"7 is valid and ensures that the set of augmented safe states is nonempty.","element":"span"}],[{"text":"Let the augmented state be ","element":"span"},{"text":"[","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"height":16.59},"width":144.48,"height":41.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-12.png","element":"img","alt":"] ∈ Rnx+r","inline":true},{"text":". Then, the following theorem states that the system for the augmented state is (asymptotically) stable with respect to the set of augmented safe states even after a violation of safety due to the abrupt and unexpected change of the agent dynamics occurs.","element":"span"}],[{"id":"id-1","style":{"fontWeight":"bold"},"text":"Theorem IV.1. ","element":"span"},{"text":"Suppose that a triple ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":55.65,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-13.png","element":"img","alt":"+1)","inline":true,"padRight":true},{"text":"is available at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"+ ","element":"span"},{"text":"1. Suppose also that a control input ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"satisfying ","element":"span"},{"href":"#id-17","text":"(IV.1) ","element":"a"},{"text":"is employed for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":14.81},"width":103.03,"height":37.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-14.png","element":"img","alt":" ∈ Z≥0","inline":true},{"text":". Then, under Assumption ","element":"span"},{"text":"IV.1, ","element":"span"},{"text":"the system for the augmented state is stable with respect to the set of augmented safe states ","element":"span"},{"style":{"height":13.39},"width":237.56,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-15.png","element":"img","alt":" C ×Ω ⊂ Rnx+r","inline":true},{"text":". If, in addition, ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":16},"width":82.98,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-16.png","element":"img","alt":" /∈ Ωn","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":14.81},"width":101.86,"height":37.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-17.png","element":"img","alt":" ∈ Z≥0","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"text":"[","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":48.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-18.png","element":"img","alt":"] /∈","inline":true},{"style":{"height":11.6},"width":107.37,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-19.png","element":"img","alt":"C × Ω","inline":true},{"text":", then the system is uniformly globally asymptotically stable with respect to ","element":"span"},{"style":{"height":13.39},"width":245.66,"height":33.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-20.png","element":"img","alt":" C ×Ω ⊂ Rnx+r","inline":true},{"text":".","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"See Appendix ","element":"span"},{"text":"B.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Remark ","element":"span"},{"text":"IV.6 (On Theorem ","element":"span"},{"href":"#id-1","text":"IV.1)","element":"a"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"Theorem ","element":"span"},{"href":"#id-1","text":"IV.1 ","element":"a"},{"text":"implies that how much the current estimate gets closer to the true dynamics depends on how much the next state of the agent is deviated from the predicted next state. Therefore, both barrier certifi-cates and model learning work together to guarantee stability. If the model learning algorithm satisfies Assumption ","element":"span"},{"text":"IV.1, ","element":"span"},{"text":"then Theorem ","element":"span"},{"href":"#id-1","text":"IV.1 ","element":"a"},{"text":"claims that safety is recovered successfully. When GPs or kernel ridge regressions are employed for model learning, for example, introducing forgetting factors or letting the sample size grow as time advances will make the algorithms adaptive to time-varying systems; in such cases, we need to make sure that the algorithms satisfy Assumption ","element":"span"},{"text":"IV.1 ","element":"span"},{"text":"to guarantee safety recovery. Numerical simulations about safety recovery is given in Section ","element":"span"},{"href":"#id-18","text":"V-A.","element":"a"}],[{"text":"If the agent dynamics keeps changing or if we know that","element":"span"}],[{"text":"there are multiple modes for dynamics, then we may have separate model learning processes as proposed in [50], and the augmented state can be regarded as following a hybrid system. Hence, stability should be analyzed under additional assumptions in this case. We leave such an analysis as a future work.","element":"span"}],[{"id":"id-3","style":{"fontStyle":"italic"},"text":"B. Structured Model Learning","element":"span"}],[{"text":"We have seen that, by employing a model learning with monotone approximation property under Assumption ","element":"span"},{"text":"IV.1, ","element":"span"},{"text":"the agent is stabilized on the set of augmented safe states even after an abrupt and unexpected change of the agent dynamics. Here, we show that a control-affine dynamics can be learned via sparse optimizations satisfying monotone approximation property in a properly defined RKHS. We assume that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"= ","element":"span"},{"text":"1 for simplicity (we can employ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"approximators if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"style":{"fontStyle":"italic"},"text":"> ","element":"span"},{"text":"1).","element":"span"}],[{"text":"First, we show that the space ","element":"span"},{"style":{"height":13.99},"width":51.61,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-21.png","element":"img","alt":" Hc","inline":true,"padRight":true},{"text":"(see Section ","element":"span"},{"href":"#id-2","text":"III-C) ","element":"a"},{"text":"is an RKHS.","element":"span"}],[{"id":"id-19","style":{"fontWeight":"bold"},"text":"Lemma IV.1. ","element":"span"},{"text":"The space ","element":"span"},{"style":{"height":13.99},"width":51.62,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-22.png","element":"img","alt":" Hc","inline":true,"padRight":true},{"text":"is an RKHS associated with the reproducing kernel ","element":"span"},{"style":{"height":16},"width":507.34,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-23.png","element":"img","alt":" κ(u,v) = 1(u) := 1,∀u,v ∈ U","inline":true,"padRight":true},{"text":", with the inner product defined as ","element":"span"},{"style":{"height":18.34},"width":467.49,"height":45.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-24.png","element":"img","alt":" ⟨α1,β1⟩Hc := αβ, α,β ∈ R","inline":true},{"text":".","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"See Appendix ","element":"span"},{"text":"C.","element":"span"}],[{"text":"Then, the following lemma implies that ","element":"span"},{"style":{"height":12},"width":32,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-25.png","element":"img","alt":" ψ","inline":true,"padRight":true},{"text":"can be approximated in the sum space of RKHSs denoted by ","element":"span"},{"style":{"height":16.79},"width":61.62,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-26.png","element":"img","alt":" Hψ","inline":true},{"text":".","element":"span"}],[{"id":"id-20","style":{"fontWeight":"bold"},"text":"Lemma IV.2 ","element":"span"},{"text":"( [51, Theorem 13])","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"Let ","element":"span"},{"style":{"height":14.01},"width":53.61,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-27.png","element":"img","alt":" H1","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":14.01},"width":53.61,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-28.png","element":"img","alt":" H2","inline":true,"padRight":true},{"text":"be two RKHSs associated with the reproducing kernels ","element":"span"},{"style":{"height":10.81},"width":36.88,"height":27.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-29.png","element":"img","alt":" κ1","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":10.81},"width":36.88,"height":27.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-30.png","element":"img","alt":"κ2","inline":true},{"text":". Then the completion of the tensor product of ","element":"span"},{"style":{"height":14.01},"width":53.62,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-31.png","element":"img","alt":" H1","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":14.01},"width":53.62,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-32.png","element":"img","alt":"H2","inline":true},{"text":", denoted by ","element":"span"},{"style":{"height":14.01},"width":153.35,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-33.png","element":"img","alt":" H1 ⊗ H2","inline":true},{"text":", is an RKHS associated with the reproducing kernel ","element":"span"},{"style":{"height":11.61},"width":117.52,"height":29.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-34.png","element":"img","alt":" κ1 ⊗κ2","inline":true},{"text":".","element":"span"}],[{"text":"From Lemmas ","element":"span"},{"href":"#id-19","text":"IV.1 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-20","text":"IV.2, ","element":"a"},{"text":"we can now assume that ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"style":{"height":16.44},"width":191.05,"height":41.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-35.png","element":"img","alt":" ∈ Hf ⊗ Hc","inline":true,"padRight":true},{"text":"and ˆ˜","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"style":{"height":16.39},"width":190.86,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-36.png","element":"img","alt":" ∈ Hg ⊗ Hu","inline":true},{"text":", where ˆ˜","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is an estimate of ˜","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"text":") ","element":"span"},{"text":":","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"text":". As such, ","element":"span"},{"style":{"height":12},"width":32,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-37.png","element":"img","alt":" ψ","inline":true,"padRight":true},{"text":"can be approximated in the RKHS ","element":"span"},{"style":{"height":16.79},"width":565.02,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-38.png","element":"img","alt":" Hψ := Hp +H f ⊗Hc +Hg ⊗Hu","inline":true},{"text":". Therefore, we can employ a kernel adaptive filter working in the sum space ","element":"span"},{"style":{"height":16.79},"width":61.62,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-39.png","element":"img","alt":" Hψ","inline":true},{"text":".","element":"span"}],[{"text":"Second, the following theorem ensures that ","element":"span"},{"style":{"height":12},"width":32,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-40.png","element":"img","alt":" ψ","inline":true,"padRight":true},{"text":"can be uniquely decomposed into ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":", and ˜","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"in the RKHS ","element":"span"},{"style":{"height":16.79},"width":61.62,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-41.png","element":"img","alt":" Hψ","inline":true},{"text":".","element":"span"}],[{"id":"id-14","style":{"fontWeight":"bold"},"text":"Theorem IV.2. ","element":"span"},{"text":"Assume that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":"have nonempty interiors. Assume also that ","element":"span"},{"style":{"height":16.39},"width":58.83,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-42.png","element":"img","alt":" Hp","inline":true,"padRight":true},{"text":"is a Gaussian RKHS. Then, ","element":"span"},{"style":{"height":16.79},"width":61.62,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-43.png","element":"img","alt":"Hψ","inline":true,"padRight":true},{"text":"is the direct sum of ","element":"span"},{"style":{"height":16.44},"width":236.62,"height":41.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-44.png","element":"img","alt":" Hp, Hf ⊗ Hc","inline":true},{"text":", and ","element":"span"},{"style":{"height":16.39},"width":152.69,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-45.png","element":"img","alt":" Hg ⊗ Hu","inline":true},{"text":", i.e., the intersection of any two of the RKHSs ","element":"span"},{"style":{"height":16.44},"width":230.65,"height":41.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-46.png","element":"img","alt":" Hp, H f ⊗Hc","inline":true},{"text":", and ","element":"span"},{"style":{"height":16.39},"width":151,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-47.png","element":"img","alt":"Hg ⊗Hu","inline":true,"padRight":true},{"text":"is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":"}","element":"span"},{"text":".","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"See Appendix ","element":"span"},{"text":"D.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Remark ","element":"span"},{"text":"IV.7 (On Theorem ","element":"span"},{"href":"#id-14","text":"IV.2)","element":"a"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"Because only the control-affine part of the learned model is used in combination with barrier certificates (see Assumption ","element":"span"},{"text":"IV.2 ","element":"span"},{"text":"and Theorem ","element":"span"},{"href":"#id-5","text":"IV.4) ","element":"a"},{"text":"and the term ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"is assumed to be a system noise added to the control-affine dynamics, the unique decomposition is crucial; if the unique decomposition does not hold, the term ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"may be able to estimate the overall dynamics, including the control-affine terms.","element":"span"}],[{"text":"By using a sparse optimization for the coefficient vector ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":11.79},"width":78.22,"height":29.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/5-48.png","element":"img","alt":" ∈ Rr","inline":true},{"text":", we wish to extract a structure of the model; from Theorem ","element":"span"},{"href":"#id-14","text":"IV.2, ","element":"a"},{"text":"the term ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"is expected to drop off when the true agent dynamics is affine in control.","element":"span"}],[{"text":"In ","element":"span"},{"text":"order ","element":"span"},{"text":"to ","element":"span"},{"text":"use ","element":"span"},{"text":"the ","element":"span"},{"text":"learned ","element":"span"},{"text":"model ","element":"span"},{"text":"in ","element":"span"},{"text":"combination with control barrier functions, each entry of the vector ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":") ","element":"span"},{"text":"is required. Assume, without loss of generality, that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"style":{"fontWeight":"bold"},"text":"e","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"style":{"height":18.07},"width":279.72,"height":45.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-0.png","element":"img","alt":"}i∈{1,2,...,nu} ⊂ U","inline":true,"padRight":true},{"text":"(this is always possible for ","element":"span"},{"style":{"height":15.2},"width":84.78,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-1.png","element":"img","alt":" U ̸=","inline":true,"padRight":true},{"text":"/0 by transforming coordinates of the control inputs and reducing the dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"text":"if necessary). Then, the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":"th entry of the vector ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":") ","element":"span"},{"text":"is given by ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":"e","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"= ","element":"span"},{"text":"ˆ","element":"span"},{"text":"˜","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"style":{"fontWeight":"bold"},"text":"e","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":")","element":"span"},{"text":". As such, we can use the learned model to constrain control inputs efficiently by using control barrier functions for explorations as well as policy updates. We analyze an adaptive action-value function approximation with barrier-certified policy updates in the next subsection.","element":"span"}],[{"id":"id-4","style":{"fontStyle":"italic"},"text":"C. Adaptive ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Action-value ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Approximation ","element":"span"},{"style":{"fontStyle":"italic"},"text":"with Barrier-certified Policy Updates","element":"span"}],[{"text":"In this subsection, we analyze the proposed adaptive action-value function approximation with barrier-certified policy updates.","element":"span"}],[{"text":"We ","element":"span"},{"text":"showed ","element":"span"},{"text":"in ","element":"span"},{"text":"Section ","element":"span"},{"href":"#id-21","text":"III-D.1 ","element":"a"},{"text":"that ","element":"span"},{"text":"the ","element":"span"},{"text":"Bellman equation ","element":"span"},{"text":"in ","element":"span"},{"href":"#id-16","text":"(II.2) ","element":"a"},{"text":"is ","element":"span"},{"text":"solved ","element":"span"},{"text":"via ","element":"span"},{"text":"iterative ","element":"span"},{"text":"nonlinear function ","element":"span"},{"text":"estimation ","element":"span"},{"text":"with ","element":"span"},{"text":"the ","element":"span"},{"text":"input-output ","element":"span"},{"text":"pairs ","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"([","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":96.38,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-2.png","element":"img","alt":"+1;φ(","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":121.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-3.png","element":"img","alt":"+1)],R(","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":18.18},"width":135.47,"height":45.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-4.png","element":"img","alt":"))}n∈Z≥0","inline":true},{"text":". ","element":"span"},{"text":"The ","element":"span"},{"text":"following theorem states that the function ","element":"span"},{"style":{"height":16.99},"width":51.64,"height":42.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-5.png","element":"img","alt":" ψQ","inline":true,"padRight":true},{"text":"defined in ","element":"span"},{"href":"#id-22","text":"(III.2) ","element":"a"},{"text":"can be estimated in a properly constructed RKHS.","element":"span"}],[{"id":"id-7","style":{"fontWeight":"bold"},"text":"Theorem IV.3. ","element":"span"},{"text":"Suppose that ","element":"span"},{"style":{"height":16.04},"width":59.62,"height":40.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-6.png","element":"img","alt":" HQ","inline":true,"padRight":true},{"text":"is an RKHS associated with the reproducing kernel ","element":"span"},{"style":{"height":17.39},"width":357,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-7.png","element":"img","alt":" κQ(·,·) : Z ×Z → R","inline":true},{"text":". Define, for ","element":"span"},{"style":{"height":12.4},"width":55.23,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-8.png","element":"img","alt":" γ ∈","inline":true,"padRight":true},{"text":"(","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"1","element":"span"},{"text":")","element":"span"},{"text":",","element":"span"}],[{"style":{"width":"66%"},"width":671,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-9.png","element":"img"}],[{"text":"Then, ","element":"span"},{"text":"the ","element":"span"},{"text":"operator ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":": ","element":"span"},{"style":{"height":18.75},"width":235.09,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-10.png","element":"img","alt":" HQ → HψQ","inline":true,"padRight":true},{"text":"defined ","element":"span"},{"text":"by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U","element":"span"},{"style":{"height":17.39},"width":108.01,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-11.png","element":"img","alt":"(ϕQ)([","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"style":{"height":17.83},"width":579.46,"height":44.57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-12.png","element":"img","alt":"]) := ϕQ(z) − γϕQ(w), ∀ϕQ ∈ HQ","inline":true},{"text":", is bijective. Moreover, ","element":"span"},{"style":{"height":18.75},"width":78.29,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-13.png","element":"img","alt":" HψQ","inline":true,"padRight":true},{"text":"is an RKHS with the inner product defined by","element":"span"}],[{"style":{"height":33.81},"width":488.54,"height":84.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-14.png","element":"img","alt":"⟨ϕ1,ϕ2⟩HψQ :=�ϕQ1 ,ϕQ2�HQ,","inline":true,"padRight":true},{"text":"(IV.2)","element":"span"}],[{"style":{"width":"85%"},"width":856,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-15.png","element":"img"}],[{"text":"The reproducing kernel of the RKHS ","element":"span"},{"style":{"height":18.75},"width":78.29,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-16.png","element":"img","alt":" HψQ","inline":true,"padRight":true},{"text":"is given by","element":"span"}],[{"id":"id-23","style":{"width":"91%"},"width":920,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-17.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"See Appendix ","element":"span"},{"text":"E.","element":"span"}],[{"text":"From Theorem ","element":"span"},{"href":"#id-7","text":"IV.3, ","element":"a"},{"text":"we can use any kernel-based method by assuming that the action-value function is in ","element":"span"},{"style":{"height":16.04},"width":59.62,"height":40.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-18.png","element":"img","alt":" HQ","inline":true},{"text":". The estimate of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":10.8},"width":17,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-19.png","element":"img","alt":"φ","inline":true,"padRight":true},{"text":"denoted by ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":10.8},"width":17,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-20.png","element":"img","alt":"φ","inline":true,"padRight":true},{"text":"is obtained by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U","element":"span"},{"style":{"height":17.39},"width":125.04,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-21.png","element":"img","alt":"−1( ˆψQ)","inline":true},{"text":", where ˆ","element":"span"},{"style":{"height":16.99},"width":51.64,"height":42.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-22.png","element":"img","alt":"ψQ","inline":true,"padRight":true},{"text":"is the estimate of ","element":"span"},{"style":{"height":20.54},"width":173.46,"height":51.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-23.png","element":"img","alt":" ψQ ∈ HψQ","inline":true},{"text":". For instance, suppose that the estimate of ","element":"span"},{"style":{"height":17.39},"width":147.32,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-24.png","element":"img","alt":" ψQ(z,w)","inline":true,"padRight":true},{"text":"for an input ","element":"span"},{"text":"[","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"text":"] ","element":"span"},{"text":"at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"is given by","element":"span"}],[{"style":{"width":"43%"},"width":439,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-25.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":11.79},"width":81.85,"height":29.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-26.png","element":"img","alt":"∈ Rr","inline":true,"padRight":true},{"text":"is the model parameter, and ","element":"span"},{"style":{"fontWeight":"bold"},"text":"k","element":"span"},{"text":"([","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"text":"]) ","element":"span"},{"text":":","element":"span"},{"text":"= ","element":"span"},{"style":{"height":16},"width":67.11,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-27.png","element":"img","alt":"[κ ([","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"text":"]","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"[","element":"span"},{"text":"˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":"1","element":"span"},{"text":"; ˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"text":"1","element":"span"},{"style":{"height":16},"width":102.51,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-28.png","element":"img","alt":"]);κ ([","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"text":"]","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"[","element":"span"},{"text":"˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":"2","element":"span"},{"text":"; ˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"text":"2","element":"span"},{"style":{"height":16},"width":168.84,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-29.png","element":"img","alt":"]);··· ;κ ([","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"text":"]","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"[","element":"span"},{"text":"˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"r","element":"span"},{"text":"; ˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"style":{"fontStyle":"italic"},"text":"r","element":"span"},{"style":{"height":16},"width":86.44,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-30.png","element":"img","alt":"])] ∈","inline":true}],[{"style":{"height":10.99},"width":40.78,"height":27.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-31.png","element":"img","alt":"Rr","inline":true,"padRight":true},{"text":"for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"style":{"height":18.07},"width":210.52,"height":45.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-32.png","element":"img","alt":"}j∈{1,2,...,r},{","inline":true,"padRight":true},{"text":"˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"w ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"style":{"height":18.07},"width":267.14,"height":45.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-33.png","element":"img","alt":"}j∈{1,2,...,r} ⊂ Z","inline":true,"padRight":true},{"text":"and for ","element":"span"},{"style":{"height":16},"width":94.19,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-34.png","element":"img","alt":" κ(·,·)","inline":true,"padRight":true},{"text":"de-fined by ","element":"span"},{"href":"#id-23","text":"(IV.3)","element":"a"},{"text":". Then, the estimate of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":17.79},"width":69.81,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-35.png","element":"img","alt":"φ(z)","inline":true,"padRight":true},{"text":"for an input ","element":"span"},{"style":{"fontWeight":"bold"},"text":"z ","element":"span"},{"text":"at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"is given by","element":"span"}],[{"id":"id-25","style":{"width":"99%"},"width":1003,"height":183,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-36.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Remark ","element":"span"},{"text":"IV.8 (On Theorem ","element":"span"},{"href":"#id-7","text":"IV.3)","element":"a"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"As discussed in Appendix ","element":"span"},{"text":"I, ","element":"span"},{"text":"the GP SARSA is reproduced by applying a GP in the space ","element":"span"},{"style":{"height":18.75},"width":78.29,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-37.png","element":"img","alt":" HψQ","inline":true},{"text":", although the GP SARSA or other kernel-based action-value function approximation is ad-hoc and designed for estimating the action-value function associated with a fixed policy under a stationary agent dynamics.","element":"span"}],[{"text":"When the parameter ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"for the estimator ˆ","element":"span"},{"style":{"height":18.83},"width":51.64,"height":47.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-38.png","element":"img","alt":"ψQn","inline":true,"padRight":true},{"text":"is monoton-","element":"span"},{"text":"ically approaching to an optimal point ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":5.6},"width":15,"height":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-39.png","element":"img","alt":"∗","inline":true,"padRight":true},{"text":"in the Euclidean norm sense, so is the model parameter for the action-value function because the same parameter is used to estimate ","element":"span"},{"style":{"height":16.99},"width":51.64,"height":42.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-40.png","element":"img","alt":" ψQ","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":10.8},"width":17,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-41.png","element":"img","alt":"φ","inline":true},{"text":". Suppose we employ a method which monotonically brings ˆ","element":"span"},{"style":{"height":18.83},"width":51.64,"height":47.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-42.png","element":"img","alt":"ψQn","inline":true,"padRight":true},{"text":"closer to an optimal function ","element":"span"},{"style":{"height":18.36},"width":68.93,"height":45.9,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-43.png","element":"img","alt":" ψQ∗","inline":true,"padRight":true},{"text":"in the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Hilber- ","element":"span"},{"style":{"fontStyle":"italic"},"text":"tian ","element":"span"},{"text":"norm sense. Then, the following corollary implies that an estimator of the action-value function also satisfies the monotonicity.","element":"span"}],[{"id":"id-15","style":{"fontWeight":"bold"},"text":"Corollary IV.1. ","element":"span"},{"text":"Let ","element":"span"},{"style":{"height":22.38},"width":213.76,"height":55.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-44.png","element":"img","alt":" HψQ ∋ ˆψQn ([","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"text":"]) ","element":"span"},{"text":":","element":"span"},{"text":"= ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":20.14},"width":138.7,"height":50.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-45.png","element":"img","alt":"φn(z) − γ","inline":true,"padRight":true},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":20.14},"width":80.89,"height":50.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-46.png","element":"img","alt":"φn(w)","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":21.91},"width":238.81,"height":54.78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-47.png","element":"img","alt":" HψQ ∋ ψQ∗([","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"style":{"height":18.76},"width":608.28,"height":46.9,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-48.png","element":"img","alt":"]) := Qφ ∗(z) − γQφ ∗(w), z,w ∈ Z","inline":true,"padRight":true},{"text":", where ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":20.59},"width":211.38,"height":51.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-49.png","element":"img","alt":"φn,Qφ ∗ ∈ HQ","inline":true},{"text":". Then, if ˆ","element":"span"},{"style":{"height":18.83},"width":51.64,"height":47.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-50.png","element":"img","alt":"ψQn","inline":true,"padRight":true},{"text":"is approaching to ","element":"span"},{"style":{"height":18.36},"width":68.93,"height":45.9,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-51.png","element":"img","alt":" ψQ∗","inline":true},{"text":", ","element":"span"},{"text":"i.e.,","element":"span"},{"style":{"height":37.62},"width":647.14,"height":94.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-52.png","element":"img","alt":"�� ˆψQn+1 −ψQ∗��HψQ ≤�� ˆψQn −ψQ∗��HψQ","inline":true},{"text":", it follows that ","element":"span"},{"style":{"height":29.53},"width":22,"height":73.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-53.png","element":"img","alt":"��","inline":true,"padRight":true},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":34.45},"width":305.21,"height":86.13,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-54.png","element":"img","alt":"φn+1 −Qφ ∗��HQ≤��","inline":true,"padRight":true},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":34.45},"width":216.23,"height":86.13,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-55.png","element":"img","alt":"φn −Qφ ∗��HQ.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"See Appendix ","element":"span"},{"text":"F.","element":"span"}],[{"text":"Note that the use of action-value functions enables us to use random control inputs instead of the target policy ","element":"span"},{"style":{"height":14.8},"width":23,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-56.png","element":"img","alt":" φ","inline":true,"padRight":true},{"text":"for exploration, and we require no models of the agent dynamics for policy updates as discussed below.","element":"span"}],[{"text":"To obtain analytical solutions for ","element":"span"},{"href":"#id-24","text":"(III.3)","element":"a"},{"text":", we follow the arguments in [37]. Suppose that ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":18.34},"width":17,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-57.png","element":"img","alt":"φn","inline":true,"padRight":true},{"text":"is given by ","element":"span"},{"href":"#id-25","text":"(IV.4)","element":"a"},{"text":". We ","element":"span"},{"id":"id-53","text":"define the reproducing kernel ","element":"span"},{"style":{"height":13.39},"width":46.19,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-58.png","element":"img","alt":" κQ","inline":true,"padRight":true},{"text":"of ","element":"span"},{"style":{"height":16.04},"width":59.61,"height":40.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-59.png","element":"img","alt":" HQ","inline":true,"padRight":true},{"text":"as the tensor kernel given by","element":"span"}],[{"style":{"width":"91%"},"width":920,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-60.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":16},"width":130.93,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-61.png","element":"img","alt":" κu(u,v)","inline":true,"padRight":true},{"text":"is, for example, defined by","element":"span"}],[{"style":{"width":"38%"},"width":382,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-62.png","element":"img"}],[{"text":"Then, ","element":"span"},{"href":"#id-24","text":"(III.3) ","element":"a"},{"text":"becomes","element":"span"}],[{"id":"id-26","style":{"width":"91%"},"width":921,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-63.png","element":"img"}],[{"text":"where the target value being maximized is linear to ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u ","element":"span"},{"text":"at ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":". Therefore, if the set ","element":"span"},{"style":{"height":16},"width":178.24,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-64.png","element":"img","alt":" S (x) ⊂ U","inline":true,"padRight":true},{"text":"is convex, an optimal solution to ","element":"span"},{"href":"#id-26","text":"(IV.6) ","element":"a"},{"text":"is guaranteed to be globally optimal, ensuring the greedy improvement of the policy.","element":"span"}],[{"text":"As pointed out in [24], ","element":"span"},{"style":{"height":16},"width":182.73,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/6-65.png","element":"img","alt":" S (x) ⊂ U","inline":true,"padRight":true},{"text":"is not a convex set in general. Instead, we consider a convex subset of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":") ","element":"span"},{"text":"under the following moderate assumptions:","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Assumption IV.2. ","element":"span"},{"text":"1) The set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":"is convex.","element":"span"}],[{"text":"2) Existence of Lipschitz continuous gradient of the barrier function: Given","element":"span"}],[{"style":{"height":16},"width":222.23,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-0.png","element":"img","alt":"R := {(1−t)","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"+","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":"( ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")+ ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":353.47,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-1.png","element":"img","alt":")u)|t ∈ [0,1],u ∈ U },","inline":true}],[{"text":"there exists a constant ","element":"span"},{"style":{"height":12.8},"width":65.25,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-2.png","element":"img","alt":" ν ≥","inline":true,"padRight":true},{"text":"0 such that the gradient of the discrete-time exponential control barrier function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":", denoted by ","element":"span"},{"style":{"height":22.44},"width":73.43,"height":56.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-3.png","element":"img","alt":"∂B(x)∂x","inline":true,"padRight":true},{"text":", satisfies","element":"span"}],[{"style":{"width":"79%"},"width":800,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-4.png","element":"img"}],[{"text":"Then, the following theorem holds.","element":"span"}],[{"id":"id-5","style":{"fontWeight":"bold"},"text":"Theorem IV.4. ","element":"span"},{"text":"Under Assumptions ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"3 and ","element":"span"},{"text":"IV.2, ","element":"span"},{"text":"assume also that ","element":"span"},{"style":{"height":19.96},"width":22,"height":49.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-5.png","element":"img","alt":"��","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":97.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-6.png","element":"img","alt":"+1 −(","inline":true,"padRight":true},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")+ ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"+","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":22},"width":182.78,"height":54.99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-7.png","element":"img","alt":")��Rnx ≤ ρ1νB","inline":true,"padRight":true},{"text":". Then, ","element":"span"},{"text":"inequality ","element":"span"},{"text":"(III.1) ","element":"span"},{"text":"is satisfied at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":14.81},"width":103.52,"height":37.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-8.png","element":"img","alt":" ∈ Z≥0","inline":true,"padRight":true},{"text":"if ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"satisfies the following:","element":"span"}],[{"id":"id-27","style":{"width":"91%"},"width":919,"height":172,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-9.png","element":"img"}],[{"text":"Moreover, ","element":"span"},{"href":"#id-27","text":"(IV.7) ","element":"a"},{"text":"defines a convex constraint for ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":".","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"See Appendix ","element":"span"},{"text":"G.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Remark ","element":"span"},{"text":"IV.9","element":"span"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"When ","element":"span"},{"style":{"height":22.44},"width":87.38,"height":56.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-10.png","element":"img","alt":"∂B(xn)∂x","inline":true,"padRight":true},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":58.52,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-11.png","element":"img","alt":") ̸=","inline":true,"padRight":true},{"text":"0 and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":"admits suffi-ciently large value of each entry of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", there always exists a ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"that satisfies ","element":"span"},{"href":"#id-27","text":"(IV.7)","element":"a"},{"text":".","element":"span"}],[{"text":"Theorem ","element":"span"},{"href":"#id-5","text":"IV.4 ","element":"a"},{"text":"essentially implies that, even when the gradient of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"along the shift of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"decreases steeply, inequality ","element":"span"},{"text":"(III.1) ","element":"span"},{"text":"holds if ","element":"span"},{"href":"#id-27","text":"(IV.7) ","element":"a"},{"text":"is satisfied. From Theorem ","element":"span"},{"href":"#id-5","text":"IV.4, ","element":"a"},{"text":"the set ˆ","element":"span"},{"style":{"height":16},"width":67.29,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-12.png","element":"img","alt":"Sn(","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"text":", defined as","element":"span"}],[{"style":{"width":"92%"},"width":925,"height":212,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-13.png","element":"img"}],[{"text":"is convex under Assumption ","element":"span"},{"text":"IV.2.","element":"span"}],[{"text":"As witnessed in the literatures (e.g., [22]), an agent might encounter deadlock situations, where the constrained control keeps the agent remain in the same state, when control barrier certificates are employed. It is even possible that there is no safe control driving the agent from those states. However, an elaborative design of control barrier functions remedies this issue, as shown in the following example.","element":"span"}],[{"id":"id-38","style":{"fontWeight":"bold"},"text":"Example IV.1. ","element":"span"},{"text":"If the agent is nonholonomic, turning inward safe regions when approaching their boundary might be infeasible. To reduce the risk of such deadlock situations, control barrier functions may be designed as","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":") = ","element":"span"},{"text":"˜","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"style":{"height":39.09},"width":252.12,"height":97.73,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-14.png","element":"img","alt":"(x)−υΓ��θ −","inline":true},{"text":"atan2","element":"span"},{"style":{"height":38.4},"width":57.67,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-15.png","element":"img","alt":"�∂","inline":true,"padRight":true},{"text":"˜","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"style":{"height":37.33},"width":120.22,"height":93.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-16.png","element":"img","alt":"(x)∂y , ∂","inline":true,"padRight":true},{"text":"˜","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"style":{"height":34.12},"width":72.73,"height":85.31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-17.png","element":"img","alt":"(x)∂","inline":true},{"text":"x","element":"span"}],[{"style":{"width":"10%"},"width":101,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-18.png","element":"img"}],[{"text":"where the state ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"text":"= [","element":"span"},{"text":"x;y;","element":"span"},{"style":{"height":16},"width":36.34,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-19.png","element":"img","alt":"θ]","inline":true,"padRight":true},{"text":"consists of the X position x, the Y position y, and the orientation ","element":"span"},{"style":{"height":11.6},"width":23,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-20.png","element":"img","alt":" θ","inline":true,"padRight":true},{"text":"of an agent from the world frame, ","element":"span"},{"style":{"height":16},"width":141.91,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-21.png","element":"img","alt":" {x ∈ X |","inline":true,"padRight":true},{"text":"˜","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"style":{"height":16},"width":138.27,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-22.png","element":"img","alt":"(x) ≥ 0}","inline":true,"padRight":true},{"text":"is the original safe region, and ","element":"span"},{"style":{"height":10.8},"width":25,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-23.png","element":"img","alt":" Γ","inline":true,"padRight":true},{"text":"is a strictly increasing function. If this control barrier function exists, then the agent is forced to turn inward the original","element":"span"}],[{"style":{"width":"88%"},"width":884,"height":343,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-24.png","element":"img"}],[{"text":"Fig. IV.1. ","element":"figcaption","subtype":"caption"},{"id":"id-28","text":"An illustration of how a nonholonomic agent avoids deadlocks. ","element":"figcaption","subtype":"caption"},{"text":"When the orientation of the agent is not considered (i.e., ","element":"figcaption","subtype":"caption"},{"text":"˜","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"B","element":"figcaption","subtype":"caption"},{"text":"(","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"x","element":"figcaption","subtype":"caption"},{"text":") ","element":"figcaption","subtype":"caption"},{"text":"is the barrier function), there might be no safe control driving the agent from those states as the left figure shows. By taking into account the orientation (i.e., ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"B","element":"figcaption","subtype":"caption"},{"text":"(","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"x","element":"figcaption","subtype":"caption"},{"text":") ","element":"figcaption","subtype":"caption"},{"text":"is the barrier function), the agent turns inward the safe region before reaching its boundaries as the right figure shows.","element":"figcaption","subtype":"caption"}],[{"id":"id-29","style":{"width":"100%"},"width":1006,"height":1185,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-25.png","element":"img"}],[{"text":"safe region before reaching its boundaries because the control barrier function also depends on ","element":"span"},{"style":{"height":11.6},"width":23,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/7-26.png","element":"img","alt":" θ","inline":true,"padRight":true},{"text":"and takes larger value when the agent is facing inward the safe region. An illustration of this example is given in Figure ","element":"span"},{"href":"#id-28","text":"IV.1.","element":"a"}],[{"text":"Resulting barrier-certified adaptive reinforcement learning framework is summarized in Algorithm ","element":"span"},{"href":"#id-29","text":"1.","element":"a"}]]},{"heading":"V. EXPERIMENTAL RESULTS","paragraphs":[[{"id":"id-6","text":"For the sake of reproducibility and for clarifying each ","element":"span"},{"text":"contribution, we first validate the proposed learning framework on simulations of vertical movements of a quadrotor, which has been used in the safe learnings literature under stationarity assumption (e.g., [7]). Then, we test the proposed learning framework on a real robot called ","element":"span"},{"style":{"fontStyle":"italic"},"text":"brushbot","element":"span"},{"text":", whose dynamics is unknown, highly complex and nonstationary","element":"span"},{"text":"1","element":"span"},{"text":". The experiments on the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"brushbot ","element":"span"},{"text":"was conducted at the Robotarium, a remotely accessible robot testbed at Georgia institute of technology [52].","element":"span"}],[{"id":"id-18","style":{"fontStyle":"italic"},"text":"A. Validations of the Safe Learning Framework via Simula- ","element":"span"},{"style":{"fontStyle":"italic"},"text":"tions of a Quadrotor","element":"span"}],[{"text":"In this experiment, we empirically validate Theorem ","element":"span"},{"href":"#id-1","text":"IV.1 ","element":"a"},{"text":"(i.e., Lyapunov stability of the set of augmented safe states after an unexpected and abrupt change of the agent dynamics) and the motivations of using an online kernel method working in the RKHS ","element":"span"},{"style":{"height":18.75},"width":78.29,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-0.png","element":"img","alt":" HψQ","inline":true,"padRight":true},{"text":"(see Section ","element":"span"},{"href":"#id-4","text":"IV-C) ","element":"a"},{"text":"for action-value function approximation. We also test the proposed framework for simulated vertical movements of a quadrotor. We use parametric model for the agent dynamics and nonparametric model for the action-value function in this experiment. The discrete-time dynamics of the vertical movement of a quadrotor is given by","element":"span"}],[{"style":{"width":"80%"},"width":809,"height":279,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-1.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":16},"width":175.93,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-2.png","element":"img","alt":" ∆t ∈ (0,∞)","inline":true,"padRight":true},{"text":"denotes the time interval, x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"and ˙x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"are the vertical position and the vertical velocity of the quadrotor at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", respectively. When the weight of the quadrotor is 0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"027kg, the nominal model is given by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"1 ","element":"span"},{"text":"= ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"2 ","element":"span"},{"text":"= ","element":"span"},{"text":"9","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"81, and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"3 ","element":"span"},{"text":"= ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"027. Let the time interval ","element":"span"},{"style":{"height":11.2},"width":25,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-3.png","element":"img","alt":" ∆","inline":true},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"be 0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"02 seconds for the simulations, and the maximum input 2","element":"span"},{"style":{"height":11.2},"width":67.03,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-4.png","element":"img","alt":"×0.","inline":true},{"text":"027","element":"span"},{"style":{"height":11.6},"width":67.02,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-5.png","element":"img","alt":"×9.","inline":true},{"text":"81.","element":"span"}],[{"text":"Control barrier certificates are used to limit the region of exploration to the area: x ","element":"span"},{"style":{"height":16},"width":146.63,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-6.png","element":"img","alt":" ∈ [−3,3]","inline":true},{"text":", and we employ the following two barrier functions:","element":"span"}],[{"style":{"width":"23%"},"width":232,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-7.png","element":"img"}],[{"text":"and we use the barrier-certificate parameter ","element":"span"},{"style":{"height":14.4},"width":114.5,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-8.png","element":"img","alt":" η = 0.","inline":true},{"text":"01 (see ","element":"span"},{"text":"(III.1)","element":"span"},{"text":") in this experiment. Note that the safe set is equivalently expressed by","element":"span"}],[{"style":{"width":"76%"},"width":764,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-9.png","element":"img"}],[{"text":"and the barrier functions satisfy Assumption ","element":"span"},{"text":"IV.2.","element":"span"},{"text":"2 with the Lipschitz constant ","element":"span"},{"style":{"height":8.8},"width":63.96,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-10.png","element":"img","alt":" ν =","inline":true,"padRight":true},{"text":"0. The immediate reward is given by","element":"span"}],[{"style":{"width":"62%"},"width":626,"height":82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-11.png","element":"img"}],[{"text":"where the constant is added to prevent the resulting value of explored states from becoming negative, i.e., lower than the value outside of the safe set.","element":"span"}],[{"style":{"width":"80%"},"width":808,"height":603,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-12.png","element":"img"}],[{"text":"Fig. V.1. ","element":"figcaption","subtype":"caption"},{"id":"id-30","text":"Trajectories of the vector ","element":"figcaption","subtype":"caption"},{"style":{"height":12.8},"width":118.36,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-13.png","element":"img","alt":" [x;h2;h3]","inline":true,"padRight":true},{"text":"of the GP-based learning and the adaptive model learning algorithm with barrier certificates from ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"n ","element":"figcaption","subtype":"caption"},{"text":"= ","element":"figcaption","subtype":"caption"},{"text":"1000 to ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"n ","element":"figcaption","subtype":"caption"},{"text":"= ","element":"figcaption","subtype":"caption"},{"text":"10000. The trajectory of the adaptive model learning algorithm converges to the forward invariant set ","element":"figcaption","subtype":"caption"},{"style":{"height":9.6},"width":82.8,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-14.png","element":"img","alt":" C ×Ω","inline":true},{"text":". GP-based learning seems slowly approaching to the safe set while safety recovery is not theoretically supported in the current settings.","element":"figcaption","subtype":"caption"}],[{"style":{"fontStyle":"italic"},"text":"1) Stability of the Safe Set: ","element":"span"},{"text":"In terms of safety recovery,","element":"span"}],[{"text":"we compare a GP-based approach, which tends to be less adaptive to time-varying systems, and a set-theoretical adaptive model learning algorithm with monotone approximation property. Random explorations by uniformly random control inputs are conducted for the first 20 seconds corresponding to 1000 iterations under the dynamics ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"height":16},"width":76.07,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-15.png","element":"img","alt":"∗ = [","inline":true},{"text":"1;9","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"81;1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"027","element":"span"},{"text":"]","element":"span"},{"text":". Then, we change the simulated dynamics and observe if the quadrotor is stabilized on the set of augmented safe states. To clearly visualize the difference between the GP-based approach and the adaptive model learning algorithm, we let the new agent dynamics be ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"height":16},"width":79.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-16.png","element":"img","alt":"∗ = [","inline":true},{"text":"1;9","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"81;5","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"027","element":"span"},{"text":"]","element":"span"},{"text":", which is an extreme situation where the maximum input generates very large acceleration.","element":"span"}],[{"text":"We define the update rule of model learning as","element":"span"}],[{"style":{"width":"90%"},"width":907,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-17.png","element":"img"}],[{"text":"which satisfies the monotone approximation property","element":"span"},{"text":"2","element":"span"},{"text":", where ","element":"span"},{"style":{"height":16},"width":154.75,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-18.png","element":"img","alt":"λ ∈ (0,2)","inline":true,"padRight":true},{"text":"is the step size. In this experiment, we used ","element":"span"},{"style":{"height":12.4},"width":103.32,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-19.png","element":"img","alt":" λ = 0.","inline":true},{"text":"6. For the GP-based learning, on the other hand, we let the noise variance of the output be 0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"01, and let the prior covariance of the parameter vector ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h ","element":"span"},{"text":"be 25","element":"span"},{"style":{"fontStyle":"italic"},"text":"I","element":"span"},{"text":".","element":"span"}],[{"text":"The trajectories of the vector ","element":"span"},{"text":"[","element":"span"},{"text":"x;","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"2","element":"span"},{"text":";","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"3","element":"span"},{"text":"] ","element":"span"},{"text":"of the GP-based learning and the adaptive model learning algorithm from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1000 to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"10000 are plotted in Figure ","element":"span"},{"href":"#id-30","text":"V.1. ","element":"a"},{"text":"We can observe that the trajectory of the adaptive model learning algorithm converges to the forward invariant set ","element":"span"},{"style":{"height":11.6},"width":107.13,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/8-20.png","element":"img","alt":" C ×Ω","inline":true},{"text":". GPbased learning seems to be slowly approaching to the safe set while safety recovery is not theoretically supported in the current settings.","element":"span"}],[{"text":"TABLE V.1 ","element":"figcaption","subtype":"caption"},{"id":"id-31","text":"S","element":"figcaption","subtype":"caption"},{"text":"UMMARY OF THE ","element":"figcaption","subtype":"caption"},{"text":"P","element":"figcaption","subtype":"caption"},{"text":"ARAMETER ","element":"figcaption","subtype":"caption"},{"text":"S","element":"figcaption","subtype":"caption"},{"text":"ETTINGS OF THE ","element":"figcaption","subtype":"caption"},{"text":"S","element":"figcaption","subtype":"caption"},{"text":"IMULATED ","element":"figcaption","subtype":"caption"},{"text":"V","element":"figcaption","subtype":"caption"},{"text":"ERTICAL ","element":"figcaption","subtype":"caption"},{"text":"M","element":"figcaption","subtype":"caption"},{"text":"OVEMENTS OF A ","element":"figcaption","subtype":"caption"},{"text":"Q","element":"figcaption","subtype":"caption"},{"text":"UADROTOR ","element":"figcaption","subtype":"caption"},{"text":"(","element":"figcaption","subtype":"caption"},{"text":"KERNEL ADAPTIVE FILTER","element":"figcaption","subtype":"caption"},{"text":")","element":"figcaption","subtype":"caption"}],[{"style":{"width":"83%"},"width":840,"height":329,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-0.png","element":"img"}],[{"id":"id-8","style":{"fontStyle":"italic"},"text":"2) Adaptive Action-value Function Approximation: ","element":"span"},{"text":"We also","element":"span"}],[{"text":"validate our action-value function approximation framework by employing a GP (i.e., the GP SARSA) and a kernel adaptive filter in the same RKHS. The parameter settings for the kernel adaptive filter are summarized in Table ","element":"span"},{"href":"#id-31","text":"V.1. ","element":"a"},{"text":"Please refer to Appendix ","element":"span"},{"text":"H ","element":"span"},{"text":"for the notations that are not in the main text. Six Gaussian kernels with different scale parameters ","element":"span"},{"style":{"height":8.4},"width":27,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-1.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"are employed for the kernel adaptive filter (i.e., ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"text":"= ","element":"span"},{"text":"6. See also Appendix ","element":"span"},{"text":"H ","element":"span"},{"text":"for more detail about multikernel adaptive filter). For the GP SARSA, we employ a Gaussian kernel with scale parameter 3, which achieved sufficiently good performance, and let the noise variance of the output be 10","element":"span"},{"style":{"height":8.4},"width":37.91,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-2.png","element":"img","alt":"−6","inline":true,"padRight":true},{"text":"(i.e., ","element":"span"},{"style":{"height":10.8},"width":65.87,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-3.png","element":"img","alt":" Σ =","inline":true,"padRight":true},{"text":"10","element":"span"},{"style":{"height":8.4},"width":37.91,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-4.png","element":"img","alt":"−6","inline":true},{"style":{"fontStyle":"italic"},"text":"I","element":"span"},{"text":". See Appendix ","element":"span"},{"text":"I.","element":"span"},{"text":"). Other parameters are the same as those of the kernel adaptive filter. In addition, we also test the GP SARSA in another settings, where the kernel function is added in the first 600 iterations (i.e., dimension of the parameter becomes ","element":"span"},{"style":{"fontStyle":"italic"},"text":"r ","element":"span"},{"text":"= ","element":"span"},{"text":"600) and is not newly added after 600 iterations. We call this as the GP SARSA 2 for convenience in this section. We employ an adaptive model learning algorithm for all of the three reinforcement learning approaches, and update policies every 1000 iterations. For the comparison purpose, we do not reset learning even when the policy is updated","element":"span"},{"text":"3","element":"span"},{"text":". Random explorations by uniformly random control inputs are conducted for the first 200 seconds corresponding to 10000 iterations under the dynamics ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"height":16},"width":73.42,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-5.png","element":"img","alt":"∗ = [","inline":true},{"text":"1;9","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"81;1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"027","element":"span"},{"text":"]","element":"span"},{"text":", and the dynamics changes to ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"height":16},"width":79.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-6.png","element":"img","alt":"∗ = [","inline":true},{"text":"1;11","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"81;0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"9","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"027","element":"span"},{"text":"] ","element":"span"},{"text":"(i.e., additional downward accelerations and degradations of batteries, for example) at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"2500. We evaluate the policy obtained at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"10000 for five times with different initial states, and we also conduct 15 runs for learning. For each policy evaluation, the initial position x follows the uniform distribution while the velocity ˙x ","element":"span"},{"text":"= ","element":"span"},{"text":"0.","element":"span"}],[{"text":"The learning curves of the normalized mean squared errors (NMSEs) of action-value function approximation, which are averaged over 15 runs and smoothed, are plotted in Figure ","element":"span"},{"href":"#id-32","text":"V.2 ","element":"a"},{"text":"for the GP SARSA, the kernel adaptive filter and the GP SARSA 2. From Figure ","element":"span"},{"href":"#id-32","text":"V.2, ","element":"a"},{"text":"we can observe that both the GP SARSA and the kernel adaptive filter show no large degradations of the NMSE even after the dynamics changes or the policy is updated, while the GP SARSA 2 stops improving the NMSE after the policy is updated (and the dynamics is changed). Because no kernel function is newly added after","element":"span"}],[{"style":{"width":"87%"},"width":876,"height":416,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-7.png","element":"img"}],[{"text":"0 ","element":"span"},{"id":"id-32","text":"1000 ","element":"span"},{"text":"2000 ","element":"span"},{"text":"3000 ","element":"span"},{"text":"4000 ","element":"span"},{"text":"5000 ","element":"span"},{"text":"6000 ","element":"span"},{"text":"7000 ","element":"span"},{"text":"8000 ","element":"span"},{"text":"9000 10000 ","element":"span"},{"text":"Iteration","element":"span"}],[{"text":"Fig. V.2. ","element":"span"},{"text":"The learning curves of the normalized mean squared errors (NMSEs) of action-value function approximation for the GP SARSA, kernel adaptive filter, and the GP SARSA 2.","element":"span"}],[{"text":"TABLE V.2 ","element":"span"},{"id":"id-33","text":"T","element":"span"},{"text":"HE EXPECTED VALUES OF THE ","element":"span"},{"text":"GP SARSA ","element":"span"},{"text":"AND THE KERNEL ADAPTIVE","element":"span"}],[{"style":{"width":"77%"},"width":774,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-8.png","element":"img"}],[{"text":"the first 600 iterations, the GP SARSA 2 could not adapt to the new policy or new dynamics.","element":"span"}],[{"text":"The expected values ","element":"span"},{"style":{"fontStyle":"italic"},"text":"E","element":"span"},{"style":{"height":19.2},"width":17,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-9.png","element":"img","alt":"�","inline":true},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"style":{"height":19.27},"width":88.5,"height":48.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-10.png","element":"img","alt":"φ(x)�","inline":true,"padRight":true},{"text":"for the GP SARSA, the kernel adaptive filter and the GP SARSA 2 associated with the policies obtained at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"10000 are shown in Table ","element":"span"},{"href":"#id-33","text":"V.2. ","element":"a"},{"text":"(expectation is taken over the 15 ","element":"span"},{"style":{"height":8},"width":31,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-11.png","element":"img","alt":" ×","inline":true,"padRight":true},{"text":"5 runs, i.e., 15 runs for learning, each of which includes five policy evaluations). Recall that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"style":{"height":10.8},"width":17,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-12.png","element":"img","alt":"φ","inline":true,"padRight":true},{"text":"is defined in ","element":"span"},{"href":"#id-34","text":"(II.1)","element":"a"},{"text":".","element":"span"}],[{"text":"Among the 15 runs for the kernel adaptive filter, we extracted the seventh run, which was successful. The left figure of Figure ","element":"span"},{"href":"#id-35","text":"V.3 ","element":"a"},{"text":"illustrates the action-value function at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"10000 of the seventh run for the kernel adaptive filter, and the right figure of Figure ","element":"span"},{"href":"#id-35","text":"V.3 ","element":"a"},{"text":"plots the trajectory of the optimal policy obtained at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"10000 for the seventh run. The simulated quadrotor was relocated at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"11000","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"12000","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"13000, and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"14000, and both the position and the velocity of the simulated quadrotor went to zeros successfully.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"3) Discussion: ","element":"span"},{"text":"The control barrier certificates with an adaptive model learning algorithm recovered safety even for an extreme situation where the control inputs start generating very large acceleration. As long as model learning algorithm satisfies Assumption ","element":"span"},{"text":"IV.1, ","element":"span"},{"text":"safety recovery is guaranteed.","element":"span"}],[{"text":"Reinforcement learning with the GP SARSA and kernel adaptive filter in the RKHS ","element":"span"},{"style":{"height":18.75},"width":78.29,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-13.png","element":"img","alt":" HψQ","inline":true,"padRight":true},{"text":"worked sufficiently well. If no kernel functions are newly added, GP-based learnings cannot adapt to the new policies or agent dynamics. Therefore, we need to sequentially add new kernel functions or use a sparse adaptive filter to prune redundant kernel functions (see also Appendix ","element":"span"},{"text":"H ","element":"span"},{"text":"for a sparse adaptive filter). We mention that identifying the RKHS ","element":"span"},{"style":{"height":18.75},"width":78.29,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/9-14.png","element":"img","alt":" HψQ","inline":true,"padRight":true},{"text":"enabled us to employ GPs for nonstationary agent dynamics without having to reset learnings. Consequently, we can effectively reuse the previous estimation of the target function if the new target function is close to the previous one.","element":"span"}],[{"text":"Our safe learning framework validated by these simulations","element":"span"}],[{"style":{"width":"91%"},"width":918,"height":370,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-0.png","element":"img"}],[{"text":"-3","element":"span"}],[{"text":"-2","element":"span"}],[{"text":"-1","element":"span"}],[{"text":"Fig. V.3. ","element":"span"},{"id":"id-35","text":"The left figure illustrates the action-value function over the position x and the velocity ˙x at ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"10000 and at the control input ","element":"span"},{"style":{"height":12.8},"width":267.11,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-1.png","element":"img","alt":" −0.027×11.81/0.9,","inline":true,"padRight":true},{"text":"which cancels out the acceleration added to the quadrotor. Positive velocities for negative positions and negative velocities for positive positions have higher values. The right figure shows the trajectory of the optimal policy obtained at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"10000 for the seventh run, which was a successful run among the 15 runs. Dashed lines indicate the time when the quadrotor was relocated. Both the position and the velocity of the simulated quadrotor went to zeros successfully.","element":"span"}],[{"style":{"width":"92%"},"width":926,"height":1042,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-2.png","element":"img"}],[{"text":"Fig. V.4. ","element":"span"},{"id":"id-36","text":"A picture of the brushbot used in the experiment. Vibrations of the ","element":"span"},{"text":"two motors propagate to the two brushes, driving the brushbot. Control inputs are of two dimensions each of which corresponds to the rotational speed of a motor.","element":"span"}],[{"text":"is now ready to be applied to a real robot called brushbot as presented below.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"B. Real-Robotics Experiments on the Brushbot","element":"span"}],[{"text":"Next, we apply our safe learning framework, which was validated by simulations, to the brushbot, which has highly nonlinear, nonholonomic and nonstationary dynamics (see Figure ","element":"span"},{"href":"#id-36","text":"V.4)","element":"a"},{"text":". The objective of this experiment is to find a policy driving the brushbot to the origin, while restricting the region of exploration. The experiment is conducted at the Robotarium, a remotely accessible robot testbed at Georgia institute of technology [52].","element":"span"}],[{"style":{"width":"100%"},"width":1006,"height":626,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"1) Experimental Condition: ","element":"span"},{"text":"The experimental conditions for model learning, reinforcement learning, control barrier functions and their parameter settings are presented below.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"a) Model learning: ","element":"span"},{"text":"The state ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"text":"= [","element":"span"},{"text":"x;y;","element":"span"},{"style":{"height":16},"width":36.34,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-4.png","element":"img","alt":"θ]","inline":true,"padRight":true},{"text":"consists of the X position x, Y position y and the orientation ","element":"span"},{"style":{"height":16},"width":188.29,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-5.png","element":"img","alt":" θ ∈ [−π,π]","inline":true,"padRight":true},{"text":"of the brushbot from the world frame. The exact positions and the orientation are recorded by motion capture systems every 0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"3 seconds. A control input ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u ","element":"span"},{"text":"is of two dimensions each of which corresponds to the rotational speed of a motor. To improve the learning efficiency and reduce the total learning time required, we identify the most significant dimension and reduce the dimensions to learn. The sole input variable of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"for the shifts of x and y, is assumed to be ","element":"span"},{"style":{"height":11.6},"width":23,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-6.png","element":"img","alt":" θ","inline":true},{"text":". The shift of ","element":"span"},{"style":{"height":11.6},"width":23,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-7.png","element":"img","alt":" θ","inline":true,"padRight":true},{"text":"is assumed to be constant over the state, and hence depends on nothing but control inputs (see Section ","element":"span"},{"href":"#id-37","text":"V-B.1.d)","element":"a"},{"text":". The brushbot used in the present study is nonholonomic, i.e., it can only go forward, and positive control inputs basically drive the brushbot in the same way as negative control inputs. As such, we use the rotational speeds of the motors as the control inputs. Moreover, to eliminate the effect of static frictions on the model, we assume that the zero control input given to the algorithm actually generates some minimum control inputs ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"style":{"height":9.2},"width":17,"height":23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-8.png","element":"img","alt":"δ","inline":true,"padRight":true},{"text":"to the motors, i.e., the actual maximum control inputs to the motors are given by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"text":"max ","element":"span"},{"style":{"height":12.9},"width":74.22,"height":32.25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-9.png","element":"img","alt":" + uδ","inline":true},{"text":", where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"text":"max ","element":"span"},{"text":"is the maximum control input fed to the algorithm.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"b) Reinforcement learning: ","element":"span"},{"text":"The state for action-value function approximation consists of the distance ","element":"span"},{"style":{"height":16.06},"width":30.92,"height":40.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-10.png","element":"img","alt":" ∥[","inline":true},{"text":"x;y","element":"span"},{"style":{"height":16.64},"width":64.26,"height":41.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-11.png","element":"img","alt":"]∥R2","inline":true,"padRight":true},{"text":"from the origin and the orientation ","element":"span"},{"style":{"height":11.6},"width":60.03,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-12.png","element":"img","alt":" θ −","inline":true},{"text":"atan2","element":"span"},{"text":"(","element":"span"},{"text":"y","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"x","element":"span"},{"text":") ","element":"span"},{"text":"which is wrapped to the interval ","element":"span"},{"style":{"height":16},"width":118.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-13.png","element":"img","alt":" [−π,π]","inline":true},{"text":". The immediate reward is given by","element":"span"}],[{"style":{"width":"59%"},"width":601,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-14.png","element":"img"}],[{"text":"where the constant is added to prevent the resulting value of explored states from becoming negative, namely, lower than the value outside of the region of exploration.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"c) Discrete-time control barrier certificates: ","element":"span"},{"text":"Control barrier certificates are used to limit the region of exploration to the rectangular area: x ","element":"span"},{"style":{"height":16},"width":79.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-15.png","element":"img","alt":" ∈ [−","inline":true},{"text":"x","element":"span"},{"text":"max","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"x","element":"span"},{"text":"max","element":"span"},{"style":{"height":16},"width":146.74,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/10-16.png","element":"img","alt":"], y ∈ [−","inline":true},{"text":"y","element":"span"},{"text":"max","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"y","element":"span"},{"text":"max","element":"span"},{"text":"]","element":"span"},{"text":", where x","element":"span"},{"text":"max ","element":"span"},{"style":{"fontStyle":"italic"},"text":"> ","element":"span"},{"text":"0 and y","element":"span"},{"text":"max ","element":"span"},{"style":{"fontStyle":"italic"},"text":"> ","element":"span"},{"text":"0. Because the brushbot can only go","element":"span"}],[{"text":"forward, we employ the following four barrier functions:","element":"span"}],[{"style":{"width":"48%"},"width":490,"height":277,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-0.png","element":"img"}],[{"text":"(see Example ","element":"span"},{"href":"#id-38","text":"IV.1 ","element":"a"},{"text":"for the motivations of using the above control barrier functions). Note that those functions satisfy Assumption ","element":"span"},{"text":"IV.2.","element":"span"},{"text":"2 and the Lipschitz constant ","element":"span"},{"style":{"height":8.8},"width":23,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-1.png","element":"img","alt":" ν","inline":true,"padRight":true},{"text":"is zero except at around ","element":"span"},{"style":{"height":17.78},"width":251.32,"height":44.45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-2.png","element":"img","alt":" θ = − π2 ,0, π2 ,π","inline":true},{"text":". (Although we can employ globally ","element":"span"},{"text":"Lipschitz functions for more rigorous treatment, we use the ","element":"span"},{"id":"id-37","text":"above functions for simplicity.) ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d) Parameter settings: ","element":"span"},{"text":"The parameter settings are summarized in Table ","element":"span"},{"href":"#id-39","text":"V.3. ","element":"a"},{"text":"Please refer to Appendix ","element":"span"},{"text":"H ","element":"span"},{"text":"for the notations that are not in the main text. Five Gaussian kernels with different scale parameters ","element":"span"},{"style":{"height":8.4},"width":27,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-3.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"are employed in action-value function approximation (i.e., ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"text":"= ","element":"span"},{"text":"5. See also Appendix ","element":"span"},{"text":"H ","element":"span"},{"text":"for more detail about multikernel adaptive filter), and six Gaussian kernels are employed in model learning for x and y (i.e., ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"text":"= ","element":"span"},{"text":"6). In model learning for ","element":"span"},{"style":{"height":11.6},"width":23,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-4.png","element":"img","alt":" θ","inline":true},{"text":", we define ","element":"span"},{"style":{"height":16.44},"width":140.89,"height":41.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-5.png","element":"img","alt":" Hp, H f","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16.39},"width":53.62,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-6.png","element":"img","alt":"Hg","inline":true,"padRight":true},{"text":"as sets of constant functions.","element":"span"}],[{"text":"The kernels of ","element":"span"},{"style":{"height":16.39},"width":58.82,"height":40.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-7.png","element":"img","alt":" Hp","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16.44},"width":61.04,"height":41.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-8.png","element":"img","alt":" H f","inline":true,"padRight":true},{"text":"are weighed by ","element":"span"},{"style":{"height":11.2},"width":96.83,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-9.png","element":"img","alt":" τ = 0.","inline":true},{"text":"1 in model learning (see Lemma ","element":"span"},{"href":"#id-40","text":"H.1 ","element":"a"},{"text":"in Appendix ","element":"span"},{"text":"H)","element":"span"},{"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"e) Procedure: ","element":"span"},{"text":"The time interval (duration of one iteration) for learning is 0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"3 seconds, and random explorations are conducted for the first 300 seconds corresponding to 1000 iterations. While exploring, the model learning algorithm adaptively learns a model whose control-affine terms, i.e., ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":")+ ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"text":", is used in combination with barrier certificates. Although barrier functions employed in the experiment reduce deadlock situations, the brushbot is forced to turn inward the region of exploration when a deadlock is detected. Note that the barrier certificates are intentionally violated in such a case. The policy is updated every 50 seconds. After 300 seconds, we stop learning a model and the action-value function, and the policy replaces random explorations. The brushbot is forced to stop when it enters into the circle of radius 0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"2 centered at the origin. When the brushbot is driven close to the origin and enters this circle, it is pushed away from the origin to see if it returns to the origi","element":"span"},{"href":"#id-41","text":"n ag","element":"a"},{"text":"ain (see Figure ","element":"span"},{"href":"#id-42","text":"V.10)","element":"a"},{"text":".","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"2) Results: ","element":"span"},{"text":"Figure ","element":"span"},{"href":"#id-41","text":"V.5 ","element":"a"},{"text":"plots ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"([","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":";0;0","element":"span"},{"text":"])","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"href":"#id-42","text":"ˆ","element":"a"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":")","element":"span"},{"text":", ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"height":21.23},"width":91.01,"height":53.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-10.png","element":"img","alt":"(1)n (x)","inline":true,"padRight":true},{"text":"and ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"height":21.23},"width":91.01,"height":53.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-11.png","element":"img","alt":"(2)n (x)","inline":true,"padRight":true},{"text":"for x and y at ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1000. Here ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"height":19.42},"width":30.64,"height":48.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-12.png","element":"img","alt":"(i)n","inline":true,"padRight":true},{"text":"is the estimate of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"height":11.6},"width":30.64,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-13.png","element":"img","alt":"(i)","inline":true,"padRight":true},{"text":"at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". Recall that these functions only depend on ","element":"span"},{"style":{"height":11.6},"width":23,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-14.png","element":"img","alt":" θ","inline":true,"padRight":true},{"text":"in this experiment to improve the learning efficiency. For the shift of ","element":"span"},{"style":{"height":11.6},"width":23,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-15.png","element":"img","alt":" θ","inline":true},{"text":", the estimators are constant over the state, and the result is ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"height":21.23},"width":176.44,"height":53.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-16.png","element":"img","alt":"(1)n (x) = 1.","inline":true},{"text":"38, ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"height":21.23},"width":207.4,"height":53.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-17.png","element":"img","alt":"(2)n (x) = −0.","inline":true},{"text":"77 ","element":"span"},{"text":"and ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"([","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":";0;0","element":"span"},{"text":"]) = ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":") = ","element":"span"},{"text":"0 at ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1000. As can be seen in Figure ","element":"span"},{"href":"#id-41","text":"V.5, ","element":"a"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"([","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":";0;0","element":"span"},{"text":"]) ","element":"span"},{"text":"is almost zero and so is ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":")","element":"span"},{"text":", implying that the proposed algorithm successfully dropped off irrelevant structural components of a model.","element":"span"}],[{"text":"Figure ","element":"span"},{"href":"#id-43","text":"V.6 ","element":"a"},{"text":"plots the trajectory of the brushbot while exploring (i.e., X,Y positions from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"0 to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1000). It is observed that the brushbot remained in the region of exploration (x ","element":"span"},{"style":{"height":16},"width":206.87,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-18.png","element":"img","alt":" ∈ [−1.2,1.2]","inline":true,"padRight":true},{"text":"and y ","element":"span"},{"style":{"height":16},"width":206.87,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-19.png","element":"img","alt":" ∈ [−1.2,1.2]","inline":true},{"text":") most of the time. Moreover, the values of barrier functions ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"style":{"height":16},"width":251.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-20.png","element":"img","alt":", i ∈ {1,2,3,4}","inline":true},{"text":", for the whole trajectory are plotted in Figure ","element":"span"},{"href":"#id-44","text":"V.7. ","element":"a"},{"text":"Even though","element":"span"}],[{"style":{"width":"88%"},"width":893,"height":408,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-21.png","element":"img"}],[{"text":"Fig. V.5. ","element":"span"},{"id":"id-41","text":"Estimated output of the model estimator at ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u ","element":"span"},{"text":"= [","element":"span"},{"text":"0;0","element":"span"},{"text":"] ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1000 over the orientation ","element":"span"},{"style":{"height":9.2},"width":19,"height":23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-22.png","element":"img","alt":" θ","inline":true},{"text":". Irrelevant structures such as ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"and ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"dropped off successfully.","element":"span"}],[{"text":"some violations of safety are seen in the figure, the brushbot returned to the safe region before large violations occurred. Despite unknown, highly complex and nonstationary system, the proposed safe learning framework was shown to work efficiently.","element":"span"}],[{"text":"Figure ","element":"span"},{"href":"#id-45","text":"V.8 ","element":"a"},{"text":"plots the trajectories of the optimal policy learned by the brushbot. Once the optimal policy replaced random explorations, the brushbot returned to the origin until ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1016 as the first figure shows. The brushbot was pushed by a sweeper at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1031","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"1075","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"1101","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"1128","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"1181 and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1230, and the trajectories of the brushbot after being pushed at ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1031","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"1075","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"1101 are also shown in Figure ","element":"span"},{"href":"#id-45","text":"V.8. ","element":"a"},{"text":"Dashed lines in the last figure indicate the time when the brushbot is pushed away. Given relatively short learning time and the fact that no simulator was used, the brushbot learned the desir","element":"span"},{"href":"#id-46","text":"able ","element":"a"},{"text":"behavior sufficiently well.","element":"span"}],[{"text":"Figure ","element":"span"},{"href":"#id-46","text":"V.9 ","element":"a"},{"text":"plots the shape of ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":20.14},"width":82.51,"height":50.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-23.png","element":"img","alt":"φn ([∥[","inline":true},{"text":"x;y","element":"span"},{"style":{"height":16.64},"width":64.26,"height":41.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/11-24.png","element":"img","alt":"]∥R2","inline":true,"padRight":true},{"text":";0","element":"span"},{"text":"]","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"[","element":"span"},{"text":"0;0","element":"span"},{"text":"]) ","element":"span"},{"text":"over ","element":"span"},{"text":"X,Y positions at ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1000. It is observed that when the control input is zero (i.e., when the brushbot basically does not move), the vicinity of the origin has the highest value, which is reasonable.","element":"span"}],[{"text":"Finally, Figure ","element":"span"},{"href":"#id-42","text":"V.10 ","element":"a"},{"text":"shows two trajectories of the brushbot returning to the origin by using the action-value function saved at ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1000. After being pushed away from the origin, the brushbot successfully returned to the origin again.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"3) Discussion: ","element":"span"},{"text":"One of the challenges of the experiments is that no initial data or simulators were available. Despite the fact that the brushbot with highly complex system had to learn an optimal policy while dealing with safety by employing an adaptive model learning algorithm, the proposed learning framework worked well in the real world. Brushbot is powered by brushes, and its dynamics highly depends on the conditions of the floor and brushes. The possible changes of the agent dynamics thus lead to some violations of safety. Nevertheless, our learning framework recovered safety quickly. In addition, the agent learned a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"good ","element":"span"},{"text":"policy within a quite short period. One reason of those successes of adaptivity and data-efficiency is the convex-analytic formulations.","element":"span"}],[{"text":"On the other hand, because no initial nominal model or policy is available and our framework is fully adaptive, i.e., we do ","element":"span"},{"style":{"fontStyle":"italic"},"text":"not ","element":"span"},{"text":"collect data to conduct batch model learning and/or reinforcement learning, we need to reduce the dimensions of input vectors to speed-up and robustify learning. This can be","element":"span"}],[{"id":"id-39","style":{"width":"93%"},"width":1914,"height":1421,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/12-0.png","element":"img"}],[{"text":"Fig. V.6. ","element":"span"},{"id":"id-43","text":"The left figure shows the trajectory of the brushbot while exploring, and the right figure shows X,Y positions over iterations. The region of ","element":"span"},{"text":"exploration is limited to x ","element":"span"},{"style":{"height":12.8},"width":420.63,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/12-1.png","element":"img","alt":" ∈ [−1.2,1.2] and y ∈ [−1.2,1.2]","inline":true},{"text":". The brushbot remained in the region most of the time.","element":"span"}],[{"style":{"width":"85%"},"width":1766,"height":597,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/12-2.png","element":"img"}],[{"text":"Fig. V.7. ","element":"span"},{"id":"id-44","text":"The values of four control barrier functions employed in the experiment for the whole trajectory. Even though some violations of safety were seen, ","element":"span"},{"text":"the brushbot returned to the safe region before large violations occurred. The nonholonomic brushbot adaptively learned a model to turn inward the region of exploration before reaching the boundaries of the region of exploration.","element":"span"}],[{"text":"an inherent limitation of our framework.","element":"span"}]]},{"heading":"VI. CONCLUSION","paragraphs":[[{"text":"The learning framework presented in this paper successfully tied model learning, reinforcement learning, and barrier certificates, enabling barrier-certified reinforcement learning for unknown, highly nonlinear, nonholonomic, and possibly nonstationary agent dynamics. The proposed model learning algorithm captures a structure of the agent dynamics by employing a sparse optimization. The resulting model has preferable structure for preserving efficient computations of barrier certificates. In addition, recovery of safety after an","element":"span"}],[{"style":{"width":"96%"},"width":1984,"height":1169,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/13-0.png","element":"img"}],[{"text":"Fig. V.8. ","element":"span"},{"id":"id-45","text":"Trajectories of the optimal policy learned by the brushbot. The optimal policy replaced random explorations at ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1000, and the brushbot returned to the origin until ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1016 (first figure). The brushbot was pushed by a sweeper at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1031","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"1075","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"1101","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"1128","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"1181, and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1230. Dashed lines in the last figure indicate the time when the brushbot was pushed away. The brushbot learned the desirable behavior sufficiently well.","element":"span"}],[{"style":{"width":"77%"},"width":782,"height":628,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/13-1.png","element":"img"}],[{"text":"Fig. V.9. ","element":"span"},{"id":"id-46","text":"The shape of the action-value function over X,Y positions at the ","element":"span"},{"text":"control input ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u ","element":"span"},{"text":"= [","element":"span"},{"text":"0;0","element":"span"},{"text":"] ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1000. The vicinity of the origin has the highest value when the control input is zero.","element":"span"}],[{"text":"unexpected and abrupt change of the agent dynamics was guaranteed by employing barrier certificates and a model learning algorithm with monotone approximation property under certain conditions. For possibly nonstationary agent dynamics, the action-value function approximation problem was appropriately reformulated so that kernel-based methods, including kernel adaptive filter, can be directly applied in an RKHS. Lastly, certain conditions were also presented to render the set of safe policies convex, thereby guaranteeing the global optimality of solutions to the policy update to ensure the greedy improvement of a policy. The experimental result shows the efficacy of the proposed learning framework in the real world.","element":"span"}]]},{"heading":"APPENDIX A","paragraphs":[[{"style":{"width":"49%"},"width":496,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/13-2.png","element":"img"}],[{"text":"See [24, Proposition 4] for the proof of forward invariance. The set ","element":"span"},{"style":{"height":12.4},"width":127.91,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/13-3.png","element":"img","alt":" C ⊂ X","inline":true,"padRight":true},{"text":"is asymptotically stable as","element":"span"}],[{"style":{"width":"57%"},"width":582,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/13-4.png","element":"img"}],[{"text":"where the inequality holds from [24, Proposition 1].","element":"span"}]]},{"heading":"APPENDIX B","paragraphs":[[{"style":{"width":"43%"},"width":432,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/13-5.png","element":"img"}],[{"text":"From Assumptions ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"1, ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"2, ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"5, and from the facts that the estimated output is linear to the model parameter at a fixed input and that ","element":"span"},{"style":{"height":16},"width":20,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/13-6.png","element":"img","alt":" ∥","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":8.8},"width":76.19,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/13-7.png","element":"img","alt":"+1 −","inline":true,"padRight":true},{"text":"ˆ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":99.43,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/13-8.png","element":"img","alt":"+1∥ ≥","inline":true,"padRight":true},{"text":"0, we obtain","element":"span"}],[{"style":{"width":"67%"},"width":680,"height":103,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/13-9.png","element":"img"}],[{"text":"for some bounded ","element":"span"},{"style":{"height":18.04},"width":81.39,"height":45.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/13-10.png","element":"img","alt":" ρ24 ≥","inline":true,"padRight":true},{"text":"0. From Assumptions ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"3, we also ","element":"span"},{"text":"obtain that","element":"span"}],[{"style":{"width":"91%"},"width":919,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/13-11.png","element":"img"}],[{"style":{"width":"100%"},"width":2058,"height":971,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-0.png","element":"img"}],[{"text":"Fig. V.10. ","element":"span"},{"id":"id-42","text":"Two trajectories of the brushbot returning to the origin by using the action-value function saved at ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= ","element":"span"},{"text":"1000. Red arrows show the trajectories. After being pushed away from the origin, the brushbot successfully returned to the origin again.","element":"span"}],[{"text":"Therefore, from Assumptions ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"6 and ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"7, and from ","element":"span"},{"style":{"height":12.8},"width":78.63,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-1.png","element":"img","alt":" νB ≥","inline":true,"padRight":true},{"text":"0, we obtain for ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"height":16},"width":188.93,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-2.png","element":"img","alt":"∗n ∈ {h ∈ Ω|","inline":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16.06},"width":130.21,"height":40.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-3.png","element":"img","alt":",Ω) = ∥","inline":true},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":16.57},"width":133.71,"height":41.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-4.png","element":"img","alt":"−h∥Rr}","inline":true,"padRight":true},{"text":"that","element":"span"}],[{"style":{"height":16},"width":51.38,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-5.png","element":"img","alt":"|B(","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":137.5,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-6.png","element":"img","alt":"+1)−B(","inline":true},{"text":"ˆ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":18.18},"width":280.79,"height":45.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-7.png","element":"img","alt":"+1)|2 −ρ21 ≤ ν2B ∥","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":8.8},"width":76.18,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-8.png","element":"img","alt":"+1 −","inline":true,"padRight":true},{"text":"ˆ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":19.45},"width":191.71,"height":48.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-9.png","element":"img","alt":"+1∥2Rnx −ρ21","inline":true},{"style":{"height":18.18},"width":120.85,"height":45.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-10.png","element":"img","alt":"≤ ν2Bρ24","inline":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":78.81,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-11.png","element":"img","alt":",Ωn)","inline":true}],[{"style":{"width":"66%"},"width":664,"height":216,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-12.png","element":"img"}],[{"text":"If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":144.13,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-13.png","element":"img","alt":"+1) < B(","inline":true},{"text":"ˆ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":55.65,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-14.png","element":"img","alt":"+1)","inline":true},{"text":", then we obtain","element":"span"}],[{"id":"id-47","style":{"width":"92%"},"width":924,"height":318,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-15.png","element":"img"}],[{"text":"This inequality also holds in case when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":147.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-16.png","element":"img","alt":"+1) ≥ B(","inline":true},{"text":"ˆ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":55.65,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-17.png","element":"img","alt":"+1)","inline":true},{"text":". Because of the continuity of the cost function ","element":"span"},{"style":{"height":13.59},"width":44.52,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-18.png","element":"img","alt":" Θn","inline":true,"padRight":true},{"text":"and the barrier function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"(Assumptions ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"3 and ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"5), the set ","element":"span"},{"style":{"height":11.6},"width":67.67,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-19.png","element":"img","alt":" C ×","inline":true},{"style":{"height":11.2},"width":31,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-20.png","element":"img","alt":"Ω","inline":true,"padRight":true},{"text":"is closed. We show that there exists a Lyapunov function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V","element":"span"},{"style":{"height":8.4},"width":69.51,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-21.png","element":"img","alt":"C ×Ω","inline":true,"padRight":true},{"text":"with respect to the closed set ","element":"span"},{"style":{"height":11.6},"width":105.15,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-22.png","element":"img","alt":" C ×Ω","inline":true,"padRight":true},{"text":"for the augmented state ","element":"span"},{"text":"[","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"text":"]","element":"span"},{"text":". A candidate function is given by","element":"span"}],[{"style":{"width":"19%"},"width":199,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-23.png","element":"img"}],[{"style":{"height":48},"width":32,"height":120,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-24.png","element":"img","alt":"�","inline":true},{"text":"0 ","element":"span"},{"text":"if ","element":"span"},{"text":"[","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"height":16},"width":161.55,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-25.png","element":"img","alt":"] ∈ C ×Ω","inline":true},{"style":{"height":4.4},"width":31,"height":11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-26.png","element":"img","alt":"−","inline":true},{"text":"min","element":"span"},{"style":{"height":26.93},"width":263.22,"height":67.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-27.png","element":"img","alt":"(B(x),0)+ 2νBρ4ρ2ρ23","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"style":{"height":16},"width":404.34,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-28.png","element":"img","alt":"(h,Ω) if [x,h] /∈ C ×Ω","inline":true}],[{"text":"Since ","element":"span"},{"style":{"height":4.4},"width":31,"height":11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-29.png","element":"img","alt":" −","inline":true},{"text":"min","element":"span"},{"style":{"height":23.57},"width":264.44,"height":58.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-30.png","element":"img","alt":"(B(x),0) + 2νBρ4ρ3","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"style":{"height":16},"width":139.91,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-31.png","element":"img","alt":"(h,Ω) =","inline":true,"padRight":true},{"text":"0, ","element":"span"},{"style":{"height":16},"width":33.12,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-32.png","element":"img","alt":" ∀[","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"height":16},"width":167.58,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-33.png","element":"img","alt":"] ∈ ∂(C ×","inline":true},{"style":{"height":16},"width":46.61,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-34.png","element":"img","alt":"Ω)","inline":true},{"text":", where ","element":"span"},{"style":{"height":16},"width":168.75,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-35.png","element":"img","alt":" ∂(C × Ω)","inline":true,"padRight":true},{"text":"is the boundary of the set ","element":"span"},{"style":{"height":11.6},"width":113.06,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-36.png","element":"img","alt":" C × Ω","inline":true},{"text":", from Assumption ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"3, the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V","element":"span"},{"style":{"height":8.4},"width":69.5,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-37.png","element":"img","alt":"C ×Ω","inline":true,"padRight":true},{"text":"is continuous. It also holds that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V","element":"span"},{"style":{"height":16},"width":98.61,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-38.png","element":"img","alt":"C ×Ω([","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"text":"]) ","element":"span"},{"style":{"fontStyle":"italic"},"text":"> ","element":"span"},{"text":"0 when ","element":"span"},{"text":"[","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"height":16},"width":164.22,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-39.png","element":"img","alt":"] /∈ C × Ω","inline":true},{"text":". Under Assumption ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"6, we obtain","element":"span"}],[{"id":"id-48","style":{"width":"91%"},"width":923,"height":361,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-40.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":14.81},"width":98.52,"height":37.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-41.png","element":"img","alt":" ∈ Z≥0","inline":true},{"text":". To show that the first inequality holds, we first show","element":"span"}],[{"style":{"width":"72%"},"width":723,"height":145,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-42.png","element":"img"}],[{"text":"(a) ","element":"span"},{"text":"For ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":71.02,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-43.png","element":"img","alt":") ≥","inline":true,"padRight":true},{"text":"0: ","element":"span"},{"text":"from ","element":"span"},{"href":"#id-17","text":"(IV.1)","element":"a"},{"text":", ","element":"span"},{"href":"#id-47","text":"(B.2)","element":"a"},{"text":", ","element":"span"},{"text":"and ","element":"span"},{"text":"0 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"< ","element":"span"},{"style":{"height":14},"width":84.32,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-44.png","element":"img","alt":"η ≤","inline":true,"padRight":true},{"text":"1, ","element":"span"},{"text":"we ","element":"span"},{"text":"obtain ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":174.75,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-45.png","element":"img","alt":"+1) ≥ ρ1","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":112.01,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-46.png","element":"img","alt":"+1) ≥","inline":true},{"style":{"height":23},"width":153.48,"height":57.5,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-47.png","element":"img","alt":"− νBρ4ρ3 �[","inline":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":98.07,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-48.png","element":"img","alt":",Ω)−","inline":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":112.18,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-49.png","element":"img","alt":"+1,Ω)]","inline":true,"padRight":true},{"text":"from ","element":"span"},{"text":"which ","element":"span"},{"text":"it follows that","element":"span"}],[{"style":{"width":"96%"},"width":966,"height":87,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-50.png","element":"img"}],[{"text":"(b) For ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"< ","element":"span"},{"text":"0 and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":93.87,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-51.png","element":"img","alt":"+1) ≥","inline":true,"padRight":true},{"text":"0: it is straightforward to see that","element":"span"}],[{"style":{"width":"78%"},"width":791,"height":145,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-52.png","element":"img"}],[{"text":"(c) For ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"< ","element":"span"},{"text":"0 and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":96.87,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-53.png","element":"img","alt":"+1) <","inline":true,"padRight":true},{"text":"0: from ","element":"span"},{"href":"#id-17","text":"(IV.1)","element":"a"},{"text":", ","element":"span"},{"href":"#id-47","text":"(B.2)","element":"a"},{"text":", and 0 ","element":"span"},{"style":{"height":14},"width":111.72,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-54.png","element":"img","alt":" < η ≤","inline":true,"padRight":true},{"text":"1, we obtain ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":148.76,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-55.png","element":"img","alt":"+1) ≥ B(","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":97.46,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-56.png","element":"img","alt":") + ρ1","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":93.19,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-57.png","element":"img","alt":"+1) −","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":23},"width":215.86,"height":57.49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-58.png","element":"img","alt":") ≥ − νBρ4ρ3 �[","inline":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":98.06,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-59.png","element":"img","alt":",Ω)−","inline":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":112.18,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/14-60.png","element":"img","alt":"+1,Ω)]","inline":true,"padRight":true},{"text":"from which it","element":"span"}],[{"text":"follows that","element":"span"}],[{"style":{"height":4.4},"width":31,"height":11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-0.png","element":"img","alt":"−","inline":true},{"text":"min","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":142.51,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-1.png","element":"img","alt":"+1),0)+","inline":true},{"text":"min","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"0","element":"span"},{"text":") = ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":97.84,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-2.png","element":"img","alt":")−B(","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":55.65,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-3.png","element":"img","alt":"+1)","inline":true},{"style":{"height":34.13},"width":122.25,"height":85.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-4.png","element":"img","alt":"≤ νBρ4ρ3","inline":true}],[{"text":"If ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":16},"width":86.54,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-5.png","element":"img","alt":" /∈ Ωn","inline":true},{"text":", under Assumption ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"6, we obtain ","element":"span"},{"style":{"height":14.4},"width":122.63,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-6.png","element":"img","alt":" ρ2ρ3 ≤","inline":true},{"style":{"height":19.2},"width":50.86,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-7.png","element":"img","alt":"�[","inline":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":98.07,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-8.png","element":"img","alt":",Ω)−","inline":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":112.18,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-9.png","element":"img","alt":"+1,Ω)]","inline":true,"padRight":true},{"text":"from ","element":"span"},{"text":"which ","element":"span"},{"text":"it ","element":"span"},{"text":"follows that ","element":"span"},{"style":{"height":19.2},"width":50.85,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-10.png","element":"img","alt":"�[","inline":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":98.06,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-11.png","element":"img","alt":",Ω)−","inline":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":22.18},"width":260.03,"height":55.45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-12.png","element":"img","alt":"+1,Ω)] ≤ 1ρ2ρ3 [","inline":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":104.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-13.png","element":"img","alt":",Ω) −","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"dist","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":112.19,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-14.png","element":"img","alt":"+1,Ω)]","inline":true,"padRight":true},{"text":"and","element":"span"}],[{"style":{"width":"74%"},"width":747,"height":354,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-15.png","element":"img"}],[{"text":"and the first inequality of ","element":"span"},{"href":"#id-48","text":"(B.3) ","element":"a"},{"text":"holds. The inequality also holds for ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":13.59},"width":80.1,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-16.png","element":"img","alt":" ∈ Ωn","inline":true},{"text":". Moreover, if ","element":"span"},{"text":"[","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":157.98,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-17.png","element":"img","alt":"] ∈ C ×Ω","inline":true},{"text":", then ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"remains in ","element":"span"},{"style":{"height":11.2},"width":31,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-18.png","element":"img","alt":"Ω","inline":true,"padRight":true},{"text":"because of monotonic approximation property. From ","element":"span"},{"href":"#id-47","text":"(B.2)","element":"a"},{"text":", the control barrier certificate ","element":"span"},{"text":"(III.1) ","element":"span"},{"text":"is thus ensured with a control input satisfying ","element":"span"},{"href":"#id-17","text":"(IV.1) ","element":"a"},{"text":"under Assumption ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"4, and the set ","element":"span"},{"style":{"height":11.6},"width":106.69,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-19.png","element":"img","alt":" C ×Ω","inline":true,"padRight":true},{"text":"is forward invariant. Therefore, the system for the augmented state is stable with respect to the set ","element":"span"},{"style":{"height":11.6},"width":105.07,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-20.png","element":"img","alt":" C ×Ω","inline":true},{"text":". If ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":16},"width":80.95,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-21.png","element":"img","alt":" /∈ Ωn","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":14.81},"width":99.82,"height":37.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-22.png","element":"img","alt":" ∈ Z≥0","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"text":"[","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":161.35,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-23.png","element":"img","alt":"] /∈ C ×Ω","inline":true},{"text":", it follows that","element":"span"}],[{"style":{"width":"91%"},"width":923,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-24.png","element":"img"}],[{"text":"and [53, Theorem 1] applies, i.e., the system for the augmented state is uniformly globally asymptotically stable with respect to the set ","element":"span"},{"style":{"height":11.6},"width":106.27,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-25.png","element":"img","alt":" C ×Ω","inline":true},{"text":".","element":"span"}]]},{"heading":"APPENDIX C","paragraphs":[[{"style":{"width":"39%"},"width":393,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-26.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":16},"width":491.22,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-27.png","element":"img","alt":" κ(u,v) = 1(u) = 1,∀u,v ∈ U","inline":true,"padRight":true},{"text":", is a positive definite kernel, it defines the unique RKHS given by span","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"style":{"fontWeight":"bold"},"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"}","element":"span"},{"text":", which is complete because it is a finite-dimensional space. For any ","element":"span"},{"style":{"height":19.67},"width":516.92,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-28.png","element":"img","alt":"ϕ := α1 ∈ Hc, ⟨ϕ,ϕ⟩Hc = α2 ≥","inline":true,"padRight":true},{"text":"0 and the equality holds if and only if ","element":"span"},{"style":{"height":8.8},"width":68.67,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-29.png","element":"img","alt":" α =","inline":true,"padRight":true},{"text":"0, or equivalently, ","element":"span"},{"style":{"height":12},"width":67.56,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-30.png","element":"img","alt":" ϕ =","inline":true,"padRight":true},{"text":"0. The symmetry and the linearity also hold, and hence ","element":"span"},{"style":{"height":18.34},"width":108.04,"height":45.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-31.png","element":"img","alt":" ⟨·,·⟩Hc","inline":true,"padRight":true},{"text":"defines the inner product. For any ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u ","element":"span"},{"style":{"height":12.4},"width":73.24,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-32.png","element":"img","alt":" ∈ U","inline":true,"padRight":true},{"text":", it holds that ","element":"span"},{"style":{"height":18.41},"width":539.42,"height":46.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-33.png","element":"img","alt":" ⟨ϕ,κ(·,u)⟩Hc = ⟨α1,1⟩Hc = α =","inline":true},{"style":{"height":16},"width":80.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-34.png","element":"img","alt":"ϕ(u)","inline":true},{"text":". Therefore, the reproducing property is satisfied.","element":"span"}]]},{"heading":"APPENDIX D","paragraphs":[[{"text":"P","element":"span"},{"text":"ROOF OF ","element":"span"},{"text":"T","element":"span"},{"text":"HEOREM ","element":"span"},{"href":"#id-14","text":"IV.2 ","element":"a"},{"text":"The following lemmas are used to prove the theorem.","element":"span"}],[{"id":"id-51","style":{"fontWeight":"bold"},"text":"Lemma D.1 ","element":"span"},{"text":"( [54, Theorem 2])","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"Let ","element":"span"},{"style":{"height":12.4},"width":156.67,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-35.png","element":"img","alt":" X ⊂ Rnx","inline":true,"padRight":true},{"text":"be any set with nonempty interior. Then, the RKHS associated with the Gaussian kernel for an arbitrary scale parameter ","element":"span"},{"style":{"height":9.6},"width":69.24,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-36.png","element":"img","alt":" σ >","inline":true,"padRight":true},{"text":"0 does not contain any polynomial on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":", including the nonzero constant function.","element":"span"}],[{"id":"id-49","style":{"fontWeight":"bold"},"text":"Lemma D.2. ","element":"span"},{"text":"Assume that ","element":"span"},{"style":{"height":12.4},"width":161.57,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-37.png","element":"img","alt":" X ⊂ Rnx","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":12.4},"width":155.71,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-38.png","element":"img","alt":" U ⊂ Rnu","inline":true,"padRight":true},{"text":"have nonempty interiors. Then, the intersection of the RKHS ","element":"span"},{"style":{"height":13.99},"width":53.62,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-39.png","element":"img","alt":" Hu","inline":true,"padRight":true},{"text":"associated with the kernel ","element":"span"},{"style":{"height":17.79},"width":410.89,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-40.png","element":"img","alt":" κ (u,v) := uTv, u,v ∈ U","inline":true,"padRight":true},{"text":", and the RKHS ","element":"span"},{"style":{"height":13.99},"width":51.61,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-41.png","element":"img","alt":" Hc","inline":true,"padRight":true},{"text":"is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":"}","element":"span"},{"text":", i.e.,","element":"span"}],[{"style":{"width":"26%"},"width":263,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-42.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"It is obvious that the function ","element":"span"},{"style":{"height":16},"width":301.22,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-43.png","element":"img","alt":" ϕ(u) = 0,∀u ∈ U","inline":true,"padRight":true},{"text":", is an element of both of the RKHSs (vector spaces) ","element":"span"},{"style":{"height":13.99},"width":53.62,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-44.png","element":"img","alt":" Hu","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":13.99},"width":51.62,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-45.png","element":"img","alt":"Hc","inline":true},{"text":". Therefore, it is sufficient to show that there exists ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u ","element":"span"},{"style":{"height":12.4},"width":73.79,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-46.png","element":"img","alt":" ∈ U","inline":true,"padRight":true},{"text":"satisfying that ","element":"span"},{"style":{"height":16},"width":177.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-47.png","element":"img","alt":" ϕ(u) ̸= ϕ(","inline":true},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"text":"int","element":"span"},{"text":")","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"text":"int ","element":"span"},{"style":{"height":9.6},"width":27,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-48.png","element":"img","alt":"∈","inline":true,"padRight":true},{"text":"int","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":")","element":"span"},{"text":", where int","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":") ","element":"span"},{"text":"denotes the interior of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":", for any ","element":"span"},{"style":{"height":16},"width":215.47,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-49.png","element":"img","alt":" ϕ ∈ Hu \\{0}","inline":true},{"text":". Assume that ","element":"span"},{"style":{"height":16},"width":117.77,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-50.png","element":"img","alt":"ϕ(v) ̸=","inline":true,"padRight":true},{"text":"0 for some ","element":"span"},{"style":{"fontWeight":"bold"},"text":"v ","element":"span"},{"style":{"height":12.4},"width":74.12,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-51.png","element":"img","alt":" ∈ U","inline":true,"padRight":true},{"text":". From [51, Theorem 3], the RKHS ","element":"span"},{"style":{"height":13.99},"width":53.62,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-52.png","element":"img","alt":"Hu","inline":true,"padRight":true},{"text":"is expressed as ","element":"span"},{"style":{"height":13.99},"width":96.84,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-53.png","element":"img","alt":" Hu =","inline":true,"padRight":true},{"text":"span","element":"span"},{"style":{"height":16},"width":213.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-54.png","element":"img","alt":"{κ (·,u)}u∈U","inline":true,"padRight":true},{"text":", which is finite dimension, implying that any function in ","element":"span"},{"style":{"height":13.99},"width":53.62,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-55.png","element":"img","alt":" Hu","inline":true,"padRight":true},{"text":"is linear. Since there exists ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"text":"int ","element":"span"},{"style":{"height":15.6},"width":178.75,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-56.png","element":"img","alt":"+ρ5v ∈ U","inline":true,"padRight":true},{"text":"for some ","element":"span"},{"style":{"height":12.8},"width":78.64,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-57.png","element":"img","alt":" ρ5 >","inline":true,"padRight":true},{"text":"0, it is proved that","element":"span"}],[{"style":{"width":"91%"},"width":921,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-58.png","element":"img"}],[{"id":"id-50","style":{"fontWeight":"bold"},"text":"Lemma D.3 ","element":"span"},{"text":"( [55, Proposition 1.3])","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"If ","element":"span"},{"style":{"height":14.01},"width":282.78,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-59.png","element":"img","alt":" H2 = H21 ⊕H22","inline":true,"padRight":true},{"text":"for given vector spaces ","element":"span"},{"style":{"height":14.01},"width":53.62,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-60.png","element":"img","alt":" H1","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":14.01},"width":53.62,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-61.png","element":"img","alt":" H2","inline":true},{"text":", then ","element":"span"},{"style":{"height":14.01},"width":398.96,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-62.png","element":"img","alt":" H1⊗H21∩H1⊗H22 =","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":"}","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"i.e., ","element":"span"},{"style":{"height":16},"width":651.25,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-63.png","element":"img","alt":" H1 ⊗H2 = (H1 ⊗H21)⊕(H1 ⊗H22).","inline":true}],[{"id":"id-52","style":{"fontWeight":"bold"},"text":"Lemma D.4. ","element":"span"},{"text":"Given ","element":"span"},{"style":{"height":12.4},"width":167.84,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-64.png","element":"img","alt":" X ⊂ Rnx","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":12.4},"width":161.98,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-65.png","element":"img","alt":" U ⊂ Rnu","inline":true},{"text":", let ","element":"span"},{"style":{"height":14.01},"width":53.62,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-66.png","element":"img","alt":" H1","inline":true},{"text":", ","element":"span"},{"style":{"height":14.01},"width":53.62,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-67.png","element":"img","alt":"H2","inline":true},{"text":", and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"be associated with the Gaussian kernels","element":"span"}],[{"style":{"width":"99%"},"width":999,"height":261,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-68.png","element":"img"}],[{"text":"spectively, for an arbitrary ","element":"span"},{"style":{"height":9.6},"width":74.03,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-69.png","element":"img","alt":" σ >","inline":true,"padRight":true},{"text":"0. Then, by regarding a function in ","element":"span"},{"style":{"height":14.01},"width":157.82,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-70.png","element":"img","alt":" H1 ⊗ H2","inline":true,"padRight":true},{"text":"as a function over the input space ","element":"span"},{"style":{"height":13.39},"width":286.67,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-71.png","element":"img","alt":"X ×U ⊂ Rnx+nu","inline":true},{"text":", it holds that","element":"span"}],[{"style":{"width":"25%"},"width":260,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-72.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"style":{"height":14.01},"width":151,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-73.png","element":"img","alt":" H1 ⊗H2","inline":true,"padRight":true},{"text":"has the reproducing kernel defined by","element":"span"}],[{"style":{"width":"81%"},"width":823,"height":471,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-74.png","element":"img"}],[{"text":"This verifies the claim.","element":"span"}],[{"text":"We are now ready to prove Theorem ","element":"span"},{"href":"#id-14","text":"IV.2.","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-14","style":{"fontStyle":"italic"},"text":"IV.2. ","element":"a"},{"text":"By Lemmas ","element":"span"},{"href":"#id-49","text":"D.2 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-50","text":"D.3, ","element":"a"},{"text":"it is derived that ","element":"span"},{"style":{"height":16.84},"width":459.36,"height":42.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-75.png","element":"img","alt":" H f ⊗ Hc ∩ Hg ⊗ Hu = {0}","inline":true},{"text":". By Lemmas ","element":"span"},{"href":"#id-51","text":"D.1, ","element":"a"},{"href":"#id-50","text":"D.3, ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-52","text":"D.4, ","element":"a"},{"text":"it holds that ","element":"span"},{"style":{"height":16.84},"width":352.32,"height":42.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-76.png","element":"img","alt":" Hp ∩Hf ⊗Hc = {0}","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16.39},"width":283.9,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-77.png","element":"img","alt":" Hp ∩Hg ⊗Hu =","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":"}","element":"span"},{"text":".","element":"span"}]]},{"heading":"APPENDIX E","paragraphs":[[{"style":{"width":"43%"},"width":434,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-78.png","element":"img"}],[{"text":"We show that the operator ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":": ","element":"span"},{"style":{"height":18.75},"width":202.27,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-79.png","element":"img","alt":" HQ → HψQ","inline":true},{"text":", which maps ","element":"span"},{"style":{"height":17.83},"width":151.88,"height":44.57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-80.png","element":"img","alt":"ϕQ ∈ HQ","inline":true,"padRight":true},{"text":"to a function ","element":"span"},{"style":{"height":19.15},"width":220.79,"height":47.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-81.png","element":"img","alt":" ϕ ∈ HψQ,ϕ([","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"style":{"height":17.39},"width":341.77,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-82.png","element":"img","alt":"]) = ϕQ(z)−γϕQ(w)","inline":true,"padRight":true},{"text":"where ","element":"span"},{"style":{"height":16},"width":321.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/15-83.png","element":"img","alt":" γ ∈ (0,1), z,w ∈ Z","inline":true,"padRight":true},{"text":", is bijective. Because the mapping ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":"is surjective by definition, we show it is also injective. For any ","element":"span"},{"style":{"height":20.19},"width":220.57,"height":50.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-0.png","element":"img","alt":" ϕQ1 ,ϕQ2 ∈ HQ","inline":true},{"text":",","element":"span"}],[{"style":{"width":"87%"},"width":882,"height":182,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-1.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"74%"},"width":751,"height":182,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-2.png","element":"img"}],[{"text":"from which the linearity holds. Therefore, it is sufficient to show that ker","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"U","element":"span"},{"text":") = ","element":"span"},{"text":"0 [56]. For any ","element":"span"},{"style":{"height":16.99},"width":86.48,"height":42.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-3.png","element":"img","alt":" ϕQ ∈","inline":true,"padRight":true},{"text":"ker","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"U","element":"span"},{"text":")","element":"span"},{"text":", we obtain","element":"span"}],[{"style":{"width":"70%"},"width":712,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-4.png","element":"img"}],[{"text":"which implies that ","element":"span"},{"style":{"height":16.99},"width":90.48,"height":42.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-5.png","element":"img","alt":" ϕQ =","inline":true,"padRight":true},{"text":"0.","element":"span"}],[{"text":"Next, we show that ","element":"span"},{"style":{"height":18.75},"width":78.29,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-6.png","element":"img","alt":" HψQ","inline":true,"padRight":true},{"text":"is an RKHS. The space ","element":"span"},{"style":{"height":18.75},"width":78.29,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-7.png","element":"img","alt":" HψQ","inline":true,"padRight":true},{"text":"with the inner product defined in ","element":"span"},{"href":"#id-53","text":"(IV.2) ","element":"a"},{"text":"is isometric to the RKHS ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"text":", and hence is a Hilbert space. Because ","element":"span"},{"style":{"height":17.39},"width":160.62,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-8.png","element":"img","alt":" κQ(·,z) −","inline":true},{"style":{"height":17.83},"width":259.83,"height":44.57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-9.png","element":"img","alt":"γκQ(·,w) ∈ HQ","inline":true},{"text":", it is true that ","element":"span"},{"style":{"height":16},"width":78.15,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-10.png","element":"img","alt":" κ(·,[","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"style":{"height":19.15},"width":150.65,"height":47.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-11.png","element":"img","alt":"]) ∈ HψQ","inline":true},{"text":". Moreover, it holds that","element":"span"}],[{"style":{"height":16.06},"width":93.61,"height":40.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-12.png","element":"img","alt":"⟨κ(·,[","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"style":{"height":16},"width":120.12,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-13.png","element":"img","alt":"]),κ(·,[","inline":true},{"text":"˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":"; ˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"style":{"height":24.18},"width":105.92,"height":60.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-14.png","element":"img","alt":"])⟩HψQ","inline":true}],[{"style":{"width":"85%"},"width":861,"height":125,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-15.png","element":"img"}],[{"style":{"height":16},"width":91.47,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-16.png","element":"img","alt":"= κ([","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"text":"]","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"[","element":"span"},{"text":"˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":"; ˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"text":"])","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"}],[{"text":"and that","element":"span"}],[{"style":{"height":16.06},"width":120.95,"height":40.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-17.png","element":"img","alt":"⟨ϕ,κ(·,","inline":true},{"text":"[","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"text":"])","element":"span"},{"style":{"height":26.3},"width":305.2,"height":65.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-18.png","element":"img","alt":"⟩HψQ =�ϕQ,κQ(·,","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":")","element":"span"},{"style":{"height":18.18},"width":142.15,"height":45.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-19.png","element":"img","alt":"−γκQ(·,","inline":true},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"text":")","element":"span"},{"style":{"height":24.25},"width":64.38,"height":60.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-20.png","element":"img","alt":"�HQ","inline":true},{"style":{"height":17.78},"width":88.16,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-21.png","element":"img","alt":"= ϕQ","inline":true},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":")","element":"span"},{"style":{"height":17.78},"width":104.52,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-22.png","element":"img","alt":"−γϕQ","inline":true},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"style":{"height":16},"width":91.14,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-23.png","element":"img","alt":") = ϕ","inline":true},{"text":"([","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"text":"])","element":"span"},{"style":{"height":18.75},"width":211.57,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-24.png","element":"img","alt":", ∀ϕ ∈ HψQ.","inline":true}],[{"text":"Therefore, ","element":"span"},{"style":{"height":17.39},"width":380.82,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-25.png","element":"img","alt":" κ(·,·) : Z 2 × Z 2 → R","inline":true,"padRight":true},{"text":"is the reproducing kernel with which the RKHS ","element":"span"},{"style":{"height":18.75},"width":78.3,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-26.png","element":"img","alt":" HψQ","inline":true,"padRight":true},{"text":"is associated.","element":"span"}]]},{"heading":"APPENDIX F","paragraphs":[[{"style":{"width":"99%"},"width":1002,"height":373,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-27.png","element":"img"}]]},{"heading":"APPENDIX G","paragraphs":[[{"style":{"width":"43%"},"width":435,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-28.png","element":"img"}],[{"text":"The line integral of ","element":"span"},{"style":{"height":22.44},"width":73.43,"height":56.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-29.png","element":"img","alt":"∂B(x)∂x","inline":true,"padRight":true},{"text":"is path independent because it is the gradient of the scaler field ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"[57]. Let ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"height":16},"width":183.66,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-30.png","element":"img","alt":"(t) := (1 −","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"+","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":9.61},"width":80.78,"height":24.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-31.png","element":"img","alt":"+1 =","inline":true,"padRight":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"+","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":"( ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":") + ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"text":", where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"style":{"height":16},"width":114.02,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-32.png","element":"img","alt":" ∈ [0,1]","inline":true,"padRight":true},{"text":"parameterizes the line path between ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":8.8},"width":37.91,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-33.png","element":"img","alt":"+1","inline":true},{"text":", then ","element":"span"},{"style":{"fontStyle":"italic"},"text":"dB","element":"span"},{"style":{"height":22.19},"width":113.85,"height":55.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-34.png","element":"img","alt":"(x(t))dt =","inline":true}],[{"style":{"height":17.75},"width":90.37,"height":44.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-35.png","element":"img","alt":"∂x (","inline":true,"padRight":true},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")+ ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"text":". Therefore, for any path ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"from ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"}],[{"text":"to ˆ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":11.21},"width":90.68,"height":28.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-36.png","element":"img","alt":"+1 :=","inline":true,"padRight":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"+ ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")+ ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", it holds under Assumption ","element":"span"},{"text":"IV.2.","element":"span"},{"text":"2 that","element":"span"}],[{"style":{"width":"81%"},"width":816,"height":198,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-37.png","element":"img"}],[{"text":"( ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")+ ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"text":"d","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"}],[{"id":"id-54","style":{"width":"95%"},"width":955,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-38.png","element":"img"}],[{"text":"The inequality implies that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":136.62,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-39.png","element":"img","alt":"+1)−B(","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":") ","element":"span"},{"text":"is greater than or equal to that in the case when ","element":"span"},{"style":{"height":22.44},"width":73.43,"height":56.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-40.png","element":"img","alt":"∂B(x)∂x","inline":true,"padRight":true},{"text":"decreases along the line path at the maximum rate. Therefore, when ","element":"span"},{"href":"#id-27","text":"(IV.7) ","element":"a"},{"text":"is satisfied, it holds from ","element":"span"},{"href":"#id-54","text":"(G.1) ","element":"a"},{"text":"that","element":"span"}],[{"style":{"width":"55%"},"width":553,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-41.png","element":"img"}],[{"text":"which is the control barrier certificate defined in ","element":"span"},{"href":"#id-17","text":"(IV.1)","element":"a"},{"text":". Hence, ","element":"span"},{"text":"(III.1) ","element":"span"},{"text":"is satisfied by the same argument as in the proof of Theorem ","element":"span"},{"href":"#id-1","text":"IV.1 ","element":"a"},{"text":"under Assumption ","element":"span"},{"text":"IV.1.","element":"span"},{"text":"3. Equation ","element":"span"},{"href":"#id-27","text":"(IV.7) ","element":"a"},{"text":"can be rewritten as ","element":"span"},{"style":{"height":16},"width":64.98,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-42.png","element":"img","alt":"∂B(","inline":true},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":") ","element":"span"},{"style":{"height":12.4},"width":23,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-43.png","element":"img","alt":"∂","inline":true},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"text":"( ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")+ ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":") ","element":"span"},{"style":{"height":19.19},"width":64.28,"height":47.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-44.png","element":"img","alt":"− ν","inline":true},{"text":"2","element":"span"}],[{"text":"The first term in the left hand side of ","element":"span"},{"text":"(G.2) ","element":"span"},{"text":"is affine to ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", the second term is the combination of a concave function ","element":"span"},{"style":{"height":20.44},"width":158.94,"height":51.09,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-45.png","element":"img","alt":"− ν2 ∥·∥2Rnx","inline":true,"padRight":true},{"text":"and an affine function of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", which is concave. ","element":"span"},{"text":"Therefore, the left hand side of ","element":"span"},{"text":"(G.2) ","element":"span"},{"text":"is a concave function, and the inequality ","element":"span"},{"text":"(G.2) ","element":"span"},{"text":"defines a convex constraint under Assumption ","element":"span"},{"text":"IV.2.","element":"span"},{"text":"1.","element":"span"}]]},{"heading":"APPENDIX H","paragraphs":[[{"style":{"width":"77%"},"width":776,"height":76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-46.png","element":"img"}],[{"text":"Kernel adaptive filter [58] is an adaptive extension of the kernel ridge regression [59], [60] or GPs. Multikernel adaptive filter [61] exploits multiple kernels to conduct learning in the sum space of RKHSs associated with each kernel. Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"style":{"height":14.01},"width":97.71,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-47.png","element":"img","alt":" ∈ Z>0","inline":true,"padRight":true},{"text":"be the number of kernels employed. Here, we only discuss the case that the dimension of the model parameter ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h ","element":"span"},{"text":"is fixed, for simplicity. Denote, by ","element":"span"},{"style":{"height":16},"width":214.76,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-48.png","element":"img","alt":" Dm := {κm(·,","inline":true,"padRight":true},{"text":"˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"style":{"height":18.07},"width":319.86,"height":45.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-49.png","element":"img","alt":", j)}j∈{1,2,...,rm}, m ∈","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"2","element":"span"},{"style":{"fontStyle":"italic"},"text":",...,","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"style":{"fontStyle":"italic"},"text":"}","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"style":{"fontStyle":"italic"},"text":"r","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"style":{"height":14.01},"width":102.92,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-50.png","element":"img","alt":" ∈ Z>0","inline":true},{"text":", the time-dependent set of functions, referred to as a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"dictionary","element":"span"},{"text":", at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"for the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"th kernel ","element":"span"},{"style":{"height":16},"width":114.17,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-51.png","element":"img","alt":"κm(·,·)","inline":true},{"text":". The current estimator ˆ","element":"span"},{"style":{"height":12},"width":42.34,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-52.png","element":"img","alt":"ψn","inline":true,"padRight":true},{"text":"is evaluated at the current input ","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", in a linear form, as","element":"span"}],[{"id":"id-55","style":{"width":"59%"},"width":598,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-53.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":":","element":"span"},{"text":"= ","element":"span"},{"text":"[","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"text":"1","element":"span"},{"style":{"height":7.6},"width":23.17,"height":19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-54.png","element":"img","alt":",n","inline":true},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"text":"2","element":"span"},{"style":{"height":12.41},"width":82.37,"height":31.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-55.png","element":"img","alt":",n;···","inline":true,"padRight":true},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"style":{"height":16.79},"width":152.74,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-56.png","element":"img","alt":",n] := [","inline":true},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"1","element":"span"},{"text":";","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"2","element":"span"},{"text":";","element":"span"},{"style":{"height":4.8},"width":41.96,"height":12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-57.png","element":"img","alt":"···","inline":true,"padRight":true},{"text":";","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"r","element":"span"},{"style":{"height":16},"width":69.91,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-58.png","element":"img","alt":"] ∈","inline":true},{"style":{"height":14.19},"width":53.92,"height":35.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-59.png","element":"img","alt":"Rr,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"r ","element":"span"},{"text":":","element":"span"},{"style":{"height":18.04},"width":148.46,"height":45.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-60.png","element":"img","alt":"= ∑Mm=1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"r","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":", ","element":"span"},{"text":"is ","element":"span"},{"text":"the ","element":"span"},{"text":"coefficent ","element":"span"},{"text":"vector, ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontWeight":"bold"},"text":"k","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":") ","element":"span"},{"text":":","element":"span"},{"text":"= [","element":"span"},{"style":{"fontWeight":"bold"},"text":"k","element":"span"},{"text":"1","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"k","element":"span"},{"text":"2","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":72.92,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-61.png","element":"img","alt":");···","inline":true,"padRight":true},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"k","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16},"width":153.01,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-62.png","element":"img","alt":")] ∈ Rr","inline":true},{"text":", ","element":"span"},{"style":{"fontWeight":"bold"},"text":"k","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":") ","element":"span"},{"text":":","element":"span"},{"text":"= ","element":"span"},{"style":{"height":16},"width":76.63,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-63.png","element":"img","alt":"[κm (","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"style":{"height":16.81},"width":125.89,"height":42.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-64.png","element":"img","alt":",1);κm (","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"style":{"height":16.81},"width":192.22,"height":42.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-65.png","element":"img","alt":",2);··· ;κm (","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"style":{"height":16.79},"width":224.85,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-66.png","element":"img","alt":",rm)] ∈ Rrm","inline":true},{"text":". ","element":"span"},{"text":"To obtain a sparse model parameter, we define the cost at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"as","element":"span"}],[{"style":{"width":"91%"},"width":922,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/16-67.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":16.01},"width":538.81,"height":40.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-0.png","element":"img","alt":" ι ∈ {n−s+1,n} ⊂ Z≥0, s ∈ Z>0","inline":true},{"text":", and","element":"span"}],[{"style":{"width":"91%"},"width":923,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-1.png","element":"img"}],[{"text":"which is a set of coefficient vector ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h ","element":"span"},{"text":"satisfying instantaneous-error-zero with a precision parameter ","element":"span"},{"style":{"height":10.81},"width":32.49,"height":27.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-2.png","element":"img","alt":" ε1","inline":true},{"text":". Here, ","element":"span"},{"style":{"height":14.39},"width":118.52,"height":35.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-3.png","element":"img","alt":" δn ∈ R","inline":true,"padRight":true},{"text":"is the output at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", and the ","element":"span"},{"style":{"height":7.6},"width":31.58,"height":19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-4.png","element":"img","alt":" ℓ1","inline":true},{"text":"-norm regularization ","element":"span"},{"style":{"height":16.58},"width":228.72,"height":41.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-5.png","element":"img","alt":"∥h∥1 := ∑ri=1 |","inline":true},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"with a parameter ","element":"span"},{"style":{"height":14.4},"width":65.38,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-6.png","element":"img","alt":" µ ≥","inline":true,"padRight":true},{"text":"0 promotes sparsity of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"text":". The update rule of the adaptive proximal forward-backward splitting [62], which is an adaptive filter designed for sparse optimizations, for the cost ","element":"span"},{"href":"#id-55","text":"(H.1) ","element":"a"},{"text":"is given by","element":"span"}],[{"style":{"width":"91%"},"width":923,"height":121,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-7.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":16},"width":155.77,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-8.png","element":"img","alt":" λ ∈ (0,2)","inline":true,"padRight":true},{"text":"is the step size, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"I ","element":"span"},{"text":"is the identity operator, and","element":"span"}],[{"style":{"width":"71%"},"width":717,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-9.png","element":"img"}],[{"text":"where ","element":"span"},{"text":"sgn","element":"span"},{"style":{"height":16},"width":42.5,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-10.png","element":"img","alt":"(·)","inline":true,"padRight":true},{"text":"is ","element":"span"},{"text":"the ","element":"span"},{"text":"sign ","element":"span"},{"text":"function. ","element":"span"},{"text":"Then, ","element":"span"},{"text":"the ","element":"span"},{"text":"strictly monotone approximation property [62]: ","element":"span"},{"style":{"height":16},"width":20,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-11.png","element":"img","alt":" ∥","inline":true},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16.72},"width":224.25,"height":41.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-12.png","element":"img","alt":"+1 −h∗n∥Rr <","inline":true},{"style":{"height":16},"width":20,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-13.png","element":"img","alt":"∥","inline":true},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":16.72},"width":321.75,"height":41.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-14.png","element":"img","alt":" −h∗n∥Rr , ∀h∗n ∈ Ωn","inline":true,"padRight":true},{"text":":","element":"span"},{"text":"= ","element":"span"},{"text":"argmin","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"height":16.84},"width":159.08,"height":42.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-15.png","element":"img","alt":"∈Rr Θn(h)","inline":true},{"text":", holds if ","element":"span"},{"style":{"fontWeight":"bold"},"text":"h","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"height":16},"width":27,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-16.png","element":"img","alt":"/∈","inline":true},{"style":{"height":15.2},"width":87.2,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-17.png","element":"img","alt":"Ωn ̸=","inline":true,"padRight":true},{"text":"/0.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Dictionary Construction: ","element":"span"},{"text":"If the dictionary is insufficient, we can employ two novelty conditions when adding the kernel functions ","element":"span"},{"style":{"height":16},"width":102.59,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-18.png","element":"img","alt":" {κm(·,","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":18.07},"width":210.31,"height":45.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-19.png","element":"img","alt":")}m∈{1,2,...,M}","inline":true,"padRight":true},{"text":"to the dictionary: (i) the maximum-dictionary-size condition","element":"span"}],[{"style":{"width":"34%"},"width":344,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-20.png","element":"img"}],[{"text":"and (ii) the large-normalized-error condition","element":"span"}],[{"style":{"width":"58%"},"width":589,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-21.png","element":"img"}],[{"text":"By using sparse optimizations, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"nonactive ","element":"span"},{"text":"structural components represented by some kernel functions can be removed, and the dictionary is refined as time goes by. To effectively achieve a compact representation of the model, it might be required to appropriately weigh the kernel functions to include some preferences on a structure of the model. The following lemma implies that the resulting kernels are still reproducing kernels.","element":"span"}],[{"id":"id-40","style":{"fontWeight":"bold"},"text":"Lemma H.1 ","element":"span"},{"text":"( [63, Theorem 2])","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"Let ","element":"span"},{"style":{"height":12},"width":299.98,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-22.png","element":"img","alt":" κ : Z × Z → R","inline":true,"padRight":true},{"text":"be the reproducing kernel of an RKHS ","element":"span"},{"style":{"height":16.57},"width":205.1,"height":41.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-23.png","element":"img","alt":" (H ,⟨·,·⟩H )","inline":true},{"text":". Then, ","element":"span"},{"style":{"height":16},"width":314.15,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-24.png","element":"img","alt":"τκ(z,w), z,w ∈ Z","inline":true,"padRight":true},{"text":"for an arbitrary ","element":"span"},{"style":{"height":9.6},"width":61.59,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-25.png","element":"img","alt":" τ >","inline":true,"padRight":true},{"text":"0 is the reproducing kernel of the RKHS ","element":"span"},{"style":{"height":18.74},"width":216.43,"height":46.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-26.png","element":"img","alt":" (Hτ,⟨·,·⟩Hτ)","inline":true,"padRight":true},{"text":"with the inner product ","element":"span"},{"style":{"height":20.07},"width":571.61,"height":50.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-27.png","element":"img","alt":"⟨z,w⟩Hτ := τ−1 ⟨z,w⟩H , z,w ∈ Z","inline":true,"padRight":true},{"text":".","element":"span"}]]},{"heading":"APPENDIX I","paragraphs":[[{"text":"C","element":"span"},{"text":"OMPARISON TO ","element":"span"},{"text":"P","element":"span"},{"text":"ARAMETRIC ","element":"span"},{"text":"A","element":"span"},{"text":"PPROACHES AND THE ","element":"span"},{"text":"GP SARSA","element":"span"}],[{"text":"If the suitable set of basis functions for approximating action-value functions is available, we can adopt a parametric approach for action-value function approximation. Suppose that an estimate of the action-value function at time instant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"is given by ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":20.14},"width":240.68,"height":50.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-28.png","element":"img","alt":"φn(z) = hTn ζ(z)","inline":true},{"text":", where ","element":"span"},{"style":{"height":16},"width":203.07,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-29.png","element":"img","alt":" ζ : Z → Rr","inline":true,"padRight":true},{"text":"is fixed ","element":"span"},{"text":"for all time. In this parametric case, given an input-output","element":"span"}],[{"style":{"width":"99%"},"width":1004,"height":217,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-30.png","element":"img"}],[{"text":"Then, stable tracking is achieved if the step size ","element":"span"},{"style":{"height":12.4},"width":23,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-31.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"is properly selected, even after the dynamics or the policy is changed.","element":"span"}],[{"text":"On the other hand, when employing a kernel-based learning, it is not trivial how to update the estimate in a theoretically formal manner. Because the output of the action-value function is not directly observable, the expansion ","element":"span"},{"style":{"height":17.7},"width":166.66,"height":44.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-32.png","element":"img","alt":" ∑ni=0 κQ(·,","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":") ","element":"span"},{"text":"(where ","element":"span"},{"style":{"height":13.39},"width":46.19,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-33.png","element":"img","alt":"κQ","inline":true,"padRight":true},{"text":"is the reproducing kernel of the RKHS containing the action-value function) cannot be validated by the representer theorem [64] any more. By defining the RKHS ","element":"span"},{"style":{"height":18.75},"width":78.29,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-34.png","element":"img","alt":" HψQ","inline":true,"padRight":true},{"text":"as in Theorem ","element":"span"},{"href":"#id-7","text":"IV.3, ","element":"a"},{"text":"however, we can view an action-value function approximation as the supervised learning in the RKHS ","element":"span"},{"style":{"height":18.75},"width":78.29,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-35.png","element":"img","alt":" HψQ","inline":true},{"text":", and can overcome the aforementioned issue. We mention that when an adaptive filter is employed in the RKHS ","element":"span"},{"style":{"height":18.75},"width":78.29,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-36.png","element":"img","alt":" HψQ","inline":true},{"text":", we do not have to reset learning even after policies are updated or the dynamics changes, since the domain of ","element":"span"},{"style":{"height":18.75},"width":78.29,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-37.png","element":"img","alt":" HψQ","inline":true,"padRight":true},{"text":"is ","element":"span"},{"style":{"height":11.6},"width":124,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-38.png","element":"img","alt":" Z ×Z","inline":true,"padRight":true},{"text":"instead of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"text":". The example below indicates that our approach is general.","element":"span"}],[{"text":"As discussed in Section ","element":"span"},{"href":"#id-56","text":"II-A, ","element":"a"},{"text":"the least squares temporal difference algorithm has been extended to kernel-based methods including the GP SARSA [37]. Given a set of input data ","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"height":16.93},"width":183.2,"height":42.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-39.png","element":"img","alt":"}n=0,1,...,Nd,","inline":true,"padRight":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":":","element":"span"},{"text":"= [","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"]","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"style":{"height":14.01},"width":98.02,"height":35.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-40.png","element":"img","alt":" ∈ Z>0","inline":true},{"text":", the posterior mean ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"and variance ","element":"span"},{"style":{"height":20.76},"width":64.55,"height":51.9,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-41.png","element":"img","alt":" µQ2","inline":true,"padRight":true},{"text":"of ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":23.39},"width":31.93,"height":58.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-42.png","element":"img","alt":"φNd","inline":true,"padRight":true},{"text":"at a point ","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"height":13.99},"width":102.99,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-43.png","element":"img","alt":"∗ ∈ Z","inline":true,"padRight":true},{"text":"are given by","element":"span"}],[{"id":"id-57","style":{"width":"99%"},"width":1004,"height":716,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-44.png","element":"img"}],[{"text":"If we employ a GP for learning ","element":"span"},{"style":{"height":16.99},"width":51.65,"height":42.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-45.png","element":"img","alt":" ψQ","inline":true,"padRight":true},{"text":"in ","element":"span"},{"style":{"height":18.75},"width":78.29,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-46.png","element":"img","alt":" HψQ","inline":true,"padRight":true},{"text":"defined in Theorem ","element":"span"},{"href":"#id-7","text":"IV.3, ","element":"a"},{"text":"the posterior mean ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"style":{"height":13.48},"width":39.68,"height":33.7,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-47.png","element":"img","alt":"ψQ","inline":true,"padRight":true},{"text":"and variance ","element":"span"},{"style":{"height":23.54},"width":85.19,"height":58.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-48.png","element":"img","alt":" µψQ2","inline":true,"padRight":true},{"text":"of ˆ","element":"span"},{"style":{"height":22.48},"width":59.27,"height":56.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-49.png","element":"img","alt":"ψQNd","inline":true,"padRight":true},{"text":"at a point ","element":"span"},{"style":{"height":16},"width":43.73,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-50.png","element":"img","alt":" [z∗","inline":true},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"style":{"height":16},"width":198.34,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-51.png","element":"img","alt":"∗] ∈ Z ×Z","inline":true,"padRight":true},{"text":"are given by","element":"span"}],[{"style":{"width":"92%"},"width":924,"height":135,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-52.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontWeight":"bold"},"text":"k","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":":","element":"span"},{"style":{"height":16},"width":133.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-53.png","element":"img","alt":"= [κ([z∗","inline":true},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"style":{"height":16},"width":54.24,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-54.png","element":"img","alt":"∗],[","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":"0","element":"span"},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"text":"1","element":"span"},{"style":{"height":16},"width":192.71,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-55.png","element":"img","alt":"]);··· ;κ([z∗","inline":true},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"w","element":"span"},{"style":{"height":16},"width":54.24,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-56.png","element":"img","alt":"∗],[","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"style":{"height":7.6},"width":37.91,"height":19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-57.png","element":"img","alt":"−1","inline":true},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":"])]","element":"span"},{"text":", and the ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":") ","element":"span"},{"text":"entry of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"K ","element":"span"},{"style":{"height":14.19},"width":135.46,"height":35.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-58.png","element":"img","alt":" ∈ RN×N","inline":true,"padRight":true},{"text":"is ","element":"span"},{"style":{"height":16},"width":51.65,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-59.png","element":"img","alt":" κ([","inline":true},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"style":{"height":7.6},"width":37.92,"height":19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-60.png","element":"img","alt":"−1","inline":true},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":"]","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"text":"[","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"style":{"height":7.6},"width":37.91,"height":19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-61.png","element":"img","alt":"−1","inline":true},{"text":";","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":"])","element":"span"},{"text":". Then, the posterior mean ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"and variance ","element":"span"},{"style":{"height":20.76},"width":64.54,"height":51.9,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-62.png","element":"img","alt":" µQ2","inline":true,"padRight":true},{"text":"of ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"height":23.39},"width":31.93,"height":58.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-63.png","element":"img","alt":"φNd","inline":true,"padRight":true},{"text":"at a ","element":"span"},{"text":"point ","element":"span"},{"style":{"fontWeight":"bold"},"text":"z","element":"span"},{"style":{"height":13.99},"width":102.99,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-64.png","element":"img","alt":"∗ ∈ Z","inline":true,"padRight":true},{"text":"are given by","element":"span"}],[{"style":{"width":"85%"},"width":854,"height":137,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1801.09627/images/17-65.png","element":"img"}],[{"text":"which result in the same values as ","element":"span"},{"href":"#id-57","text":"(I.1) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-57","text":"(I.2)","element":"a"},{"text":".","element":"span"}]]},{"heading":"ACKNOWLEDGMENTS","paragraphs":[[{"text":"M. Ohnishi thanks all of those who have given him insightful comments on this work, including the members of the Georgia Robotics and Intelligent Systems Laboratory. The authors thank all of the anonymous reviewers for their constructive suggestions.","element":"span"}]]},{"heading":"REFERENCES","paragraphs":[[{"text":"[1] R. S. Sutton and A. G. Barto, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Reinforcement learning: An introduction","element":"span"},{"text":". MIT Press, 1998.","element":"span"}],[{"text":"[2] F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Circuits and Systems Magazine","element":"span"},{"text":", vol. 9, no. 3, 2009.","element":"span"}],[{"text":"[3] D. Liberzon, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Calculus of variations and optimal control theory: a concise introduction","element":"span"},{"text":". ","element":"span"},{"text":"Princeton University Press, 2011.","element":"span"}],[{"text":"[4] F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. NIPS","element":"span"},{"text":", 2017.","element":"span"}],[{"text":"[5] F. Berkenkamp, R. Moriconi, A. P. Schoellig, and A. Krause, “Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. CDC","element":"span"},{"text":", 2016, pp. 4661–4666.","element":"span"}],[{"text":"[6] J. Schreiter, D. Nguyen-Tuong, M. Eberts, B. Bischoff, H. Markert, and M. Toussaint, “Safe exploration for active learning with Gaussian processes,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. ECML PKDD","element":"span"},{"text":", 2015, pp. 133–149.","element":"span"}],[{"text":"[7] A. K. Akametalu, J. F. Fisac, J. H. Gillula, S. Kaynama, M. N. Zeilinger, and C. J. Tomlin, “Reachability-based safe learning with Gaussian processes,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. CDC","element":"span"},{"text":", 2014, pp. 1424–1431.","element":"span"}],[{"text":"[8] S. Shalev-Shwartz, S. Shammah, and A. Shashua, “Safe, multiagent, reinforcement learning for autonomous driving,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint ","element":"span"},{"href":"http://arxiv.org/abs/1610.03295","style":{"fontStyle":"italic"},"text":"arXiv:1610.03295","element":"a"},{"text":", 2016.","element":"span"}],[{"text":"[9] H. B. Ammar, R. Tutunov, and E. Eaton, “Safe policy search for lifelong reinforcement learning with sublinear regret,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. ICML","element":"span"},{"text":", 2015, pp. 2361–2369.","element":"span"}],[{"text":"[10] D. A. Niekerk, B. V. and B. Rosman, “Online constrained model-based reinforcement learning,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. AUAI","element":"span"},{"text":", 2017.","element":"span"}],[{"text":"[11] J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. ICML","element":"span"},{"text":", 2017.","element":"span"}],[{"text":"[12] P. Abbeel and A. Y. Ng, “Exploration and apprenticeship learning in reinforcement learning,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. ICML","element":"span"},{"text":", 2005, pp. 1–8.","element":"span"}],[{"text":"[13] L. Wang, E. A. Theodorou, and M. Egerstedt, “Safe learning of quadrotor dynamics using barrier certificates,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Proc. ICRA","element":"span"},{"text":", 2018, pp. 2460–2465.","element":"span"}],[{"text":"[14] J. Garcıa and F. Fern´andez, “A comprehensive survey on safe reinforcement learning,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J. Mach. Learn. Res.","element":"span"},{"text":", vol. 16, no. 1, pp. 1437–1480, 2015.","element":"span"}],[{"text":"[15] B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Robotics and Autonomous Systems","element":"span"},{"text":", vol. 57, no. 5, pp. 469–483, 2009.","element":"span"}],[{"text":"[16] P. Geibel, “Reinforcement learning for MDPs with constraints,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. ECML","element":"span"},{"text":", vol. 4212, 2006, pp. 646–653.","element":"span"}],[{"text":"[17] S. P. Coraluppi and S. I. Marcus, “Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Automatica","element":"span"},{"text":", vol. 35, no. 2, pp. 301–309, 1999.","element":"span"}],[{"text":"[18] C. E. Rasmussen and C. K. Williams, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Gaussian processes for machine learning","element":"span"},{"text":". ","element":"span"},{"text":"MIT press Cambridge, 2006, vol. 1.","element":"span"}],[{"text":"[19] X. Xu, P. Tabuada, J. W. Grizzle, and A. D. Ames, “Robustness of control barrier functions for safety critical control,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. IFAC","element":"span"},{"text":", vol. 48, no. 27, 2015, pp. 54–61.","element":"span"}],[{"text":"[20] P. Wieland and F. Allg¨ower, “Constructive safety using control barrier functions,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. IFAC","element":"span"},{"text":", vol. 40, no. 12, 2007, pp. 462–467.","element":"span"}],[{"text":"[21] P. Glotfelter, J. Cort´es, and M. Egerstedt, “Nonsmooth barrier functions with applications to multi-robot systems,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Control Systems Letters","element":"span"},{"text":", vol. 1, no. 2, pp. 310–315, 2017.","element":"span"}],[{"text":"[22] L. Wang, A. D. Ames, and M. Egerstedt, “Safety barrier certificates for collisions-free multirobot systems,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Trans. Robotics","element":"span"},{"text":", 2017.","element":"span"}],[{"text":"[23] A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Trans. Automatic Control","element":"span"},{"text":", vol. 62, no. 8, pp. 3861–3876, 2017.","element":"span"}],[{"text":"[24] A. Agrawal and K. Sreenath, “Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. RSS","element":"span"},{"text":", 2017.","element":"span"}],[{"text":"[25] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The International Journal of Robotics Research","element":"span"},{"text":", vol. 32, no. 11, pp. 1238–1274, 2013.","element":"span"}],[{"text":"[26] V. M. Janakiraman, X. L. Nguyen, and D. Assanis, “A Lyapunov based stable online learning algorithm for nonlinear dynamical systems using extreme learning machines,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Proc. IJCNN","element":"span"},{"text":", 2013, pp. 1–8.","element":"span"}],[{"text":"[27] M. French and E. Rogers, “Non-linear iterative learning by an adaptive Lyapunov technique,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Journal of Control","element":"span"},{"text":", vol. 73, no. 10, pp. 840–850, 2000.","element":"span"}],[{"text":"[28] M. M. Polycarpou, “Stable adaptive neural control scheme for nonlinear systems,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Trans. Automatic Control","element":"span"},{"text":", vol. 41, no. 3, pp. 447–451, 1996.","element":"span"}],[{"text":"[29] K. J. ","element":"span"},{"text":"˚","element":"span"},{"text":"Astr¨om and B. Wittenmark, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Adaptive control","element":"span"},{"text":". Courier Corporation, 2013.","element":"span"}],[{"text":"[30] C. A. Cheng and H. P. Huang, “Learn the Lagrangian: A vector-valued RKHS approach to identifying Lagrangian systems,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Trans. Cybernetics","element":"span"},{"text":", vol. 46, no. 12, pp. 3247–3258, 2016.","element":"span"}],[{"text":"[31] D. Ormoneit and P. Glynn, “Kernel-based reinforcement learning in average-cost problems,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Trans. Automatic Control","element":"span"},{"text":", vol. 47, no. 10, pp. 1624–1636, 2002.","element":"span"}],[{"text":"[32] X. Xu, D. Hu, and X. Lu, “Kernel-based least squares policy iteration for reinforcement learning,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Trans. Neural Networks","element":"span"},{"text":", vol. 18, no. 4, pp. 973–992, 2007.","element":"span"}],[{"text":"[33] G. Taylor and R. Parr, “Kernelized value function approximation for reinforcement learning,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. ICML","element":"span"},{"text":", 2009, pp. 1017–1024.","element":"span"}],[{"text":"[34] W. Sun and J. A. Bagnell, “Online Bellman residual and temporal difference algorithms with predictive error guarantees,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. IJCAI","element":"span"},{"text":", 2016.","element":"span"}],[{"text":"[35] Y. Nishiyama, A. Boularias, A. Gretton, and K. Fukumizu, “Hilbert space embeddings of POMDPs,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. UAI","element":"span"},{"text":", 2012.","element":"span"}],[{"text":"[36] S. Grunewalder, G. Lever, L. Baldassarre, M. Pontil, and A. Gretton, “Modelling transition dynamics in MDPs with RKHS embeddings,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. ICML","element":"span"},{"text":", 2012.","element":"span"}],[{"text":"[37] Y. Engel, S. Mannor, and R. Meir, “Reinforcement learning with Gaussian processes,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. ICML","element":"span"},{"text":", 2005, pp. 201–208.","element":"span"}],[{"text":"[38] A. Barreto, D. Precup, and J. Pineau, “Practical kernel-based reinforcement learning,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J. Mach. Learn. Res.","element":"span"},{"text":", vol. 17, no. 1, pp. 2372–2441, 2016.","element":"span"}],[{"text":"[39] A. S. Barreto, D. Precup, and J. Pineau, “Reinforcement learning using kernel-based stochastic factorization,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. NIPS","element":"span"},{"text":", 2011, pp. 720–728.","element":"span"}],[{"text":"[40] B. Kveton and G. Theocharous, “Kernel-based reinforcement learning on representative states.” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. AAAI","element":"span"},{"text":", 2012.","element":"span"}],[{"text":"[41] J. Bae, P. Chhatbar, J. T. Francis, J. C. Sanchez, and J. C. Principe, “Reinforcement learning via kernel temporal difference,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Proc. EMBC","element":"span"},{"text":", 2011.","element":"span"}],[{"text":"[42] J. Reisinger, P. Stone, and R. Miikkulainen, “Online kernel selection for Bayesian reinforcement learning,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. ICML","element":"span"},{"text":", 2008, pp. 816–823.","element":"span"}],[{"text":"[43] Y. Cui, T. Matsubara, and K. Sugimoto, “Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Neural Networks","element":"span"},{"text":", vol. 94, pp. 13–23, 2017.","element":"span"}],[{"text":"[44] H. Van H., J. Peters, and G. Neumann, “Learning of non-parametric control policies with high-dimensional state features,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Artificial Intelligence and Statistics","element":"span"},{"text":", 2015, pp. 995–1003.","element":"span"}],[{"text":"[45] N. Aronszajn, “Theory of reproducing kernels,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Trans. Amer. Math. Soc.","element":"span"},{"text":", vol. 68, no. 3, pp. 337–404, May 1950.","element":"span"}],[{"text":"[46] I. Steinwart, “On the influence of the kernel on the consistency of support vector machines,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J. Mach. Learn. Res.","element":"span"},{"text":", vol. 2, pp. 67–93, 2001.","element":"span"}],[{"text":"[47] I. Yamada and N. Ogura, “Adaptive projected subgradient method for asymptotic minimization of sequence of nonnegative convex functions,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Numerical Functional Analysis and Optimization","element":"span"},{"text":", vol. 25, no. 7&8, pp. 593–617, 2004.","element":"span"}],[{"text":"[48] M. L. Puterman and S. L. Brumelle, “On the convergence of policy iteration in stationary dynamic programming,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mathematics of Operations Research","element":"span"},{"text":", vol. 4, no. 1, pp. 60–69, 1979.","element":"span"}],[{"text":"[49] D. P. Bertsekas, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Dynamic programming and optimal control","element":"span"},{"text":". ","element":"span"},{"text":"Athena Scientific Belmont, MA, 2005, vol. 1, no. 3.","element":"span"}],[{"text":"[50] C. D. McKinnon and A. P. Schoellig, “Experience-based model selection to enable long-term, safe control for repetitive tasks under changing conditions,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Proc. IROS","element":"span"},{"text":", 2018, pp. 2977–2984.","element":"span"}],[{"text":"[51] A. Berlinet and A. C. Thomas, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Reproducing kernel Hilbert spaces in probability and statistics","element":"span"},{"text":". ","element":"span"},{"text":"Kluwer, 2004.","element":"span"}],[{"text":"[52] D. Pickem, P. Glotfelter, L. Wang, M. Mote, A. Ames, E. Feron, and M. Egerstedt, “The robotarium: A remotely accessible swarm robotics research testbed,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Proc. ICRA","element":"span"},{"text":", 2017, pp. 1699–1706.","element":"span"}],[{"text":"[53] Z. P. Jiang and Y. Wang, “A converse Lyapunov theorem for discrete-time systems with disturbances,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Systems & Control Letters","element":"span"},{"text":", vol. 45, no. 1, pp. 49–58, 2002.","element":"span"}],[{"text":"[54] H. Q. Minh, “Some properties of Gaussian reproducing kernel Hilbert spaces and their implications for function approximation and learning theory,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Constructive Approximation","element":"span"},{"text":", vol. 32, no. 2, pp. 307–338, 2010.","element":"span"}],[{"text":"[55] R. A. Ryan, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Introduction to tensor products of Banach spaces","element":"span"},{"text":". Springer Science & Business Media, 2013.","element":"span"}],[{"text":"[56] G. Strang, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Introduction to linear algebra","element":"span"},{"text":". ","element":"span"},{"text":"Wellesley-Cambridge Press Wellesley, MA, 1993, vol. 3.","element":"span"}],[{"text":"[57] L. V. Ahlfors, “Complex analysis: an introduction to the theory of analytic functions of one complex variable,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"New York, London","element":"span"},{"text":", p. 177, 1953.","element":"span"}],[{"text":"[58] W. Liu, J. Pr´ıncipe, and S. Haykin, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Kernel adaptive filtering","element":"span"},{"text":". ","element":"span"},{"text":"New Jersey: Wiley, 2010.","element":"span"}],[{"text":"[59] K. R. M¨uller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf, “An introduction to kernel-based learning algorithms,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Trans. Neural Networks","element":"span"},{"text":", vol. 12, no. 2, pp. 181–201, 2001.","element":"span"}],[{"text":"[60] B. Sch¨oelkopf and A. Smola, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Learning with kernels","element":"span"},{"text":". ","element":"span"},{"text":"MIT Press, Cambridge, 2002.","element":"span"}],[{"text":"[61] M. Yukawa, “Multikernel adaptive filtering,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Trans. Signal Processing","element":"span"},{"text":", vol. 60, no. 9, pp. 4672–4682, Sept. 2012.","element":"span"}],[{"text":"[62] Y. Murakami, M. Yamagishi, M. Yukawa, and I. Yamada, “A sparse adaptive filtering using time-varying soft-thresholding techniques,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. IEEE ICASSP","element":"span"},{"text":", 2010, pp. 3734–3737.","element":"span"}],[{"text":"[63] M. Yukawa, “Adaptive learning in Cartesian product of reproducing kernel Hilbert spaces,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Trans. Signal Processing","element":"span"},{"text":", vol. 63, no. 22, pp. 6037–6048, Nov. 2015.","element":"span"}],[{"text":"[64] G. Kimeldorf and G. Wahba, “Some results on Tchebycheffian spline functions,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Mathematical Analysis and Applications","element":"span"},{"text":", vol. 33, no. 1, pp. 82–95, 1971.","element":"span"}]]}],"_version":"3.3.4"},"paperNode":"$28:props:children:props:children:0:props:product"}]]