Verification of Neural Network Control Policy Under Persistent Adversarial Perturbation

2019·Arxiv

Abstract

Abstract

Deep neural networks are known to be fragile to small adversarial perturbations. This issue becomes more critical when a neural network is interconnected with a physical system in a closed loop. In this paper, we show how to combine recent works on neural network certification tools (which are mainly used in static settings such as image classification) with robust control theory to certify a neural network policy in a control loop. Specifically, we give a sufficient condition and an algorithm to ensure that the closed loop state and control constraints are satisfied when the persistent adversarial perturbation is norm bounded. Our method is based on finding a positively invariant set of the closed loop dynamical system, and thus we do not require the differentiability or the continuity of the neural network policy. Along with the verification result, we also develop an effective attack strategy for neural network control systems that outperforms exhaustive Monte-Carlo search significantly. We show that our certification algorithm works well on learned models and achieves 5 times better result than the traditional Lipschitz-based method to certify the robustness of a neural network policy on a cart pole control problem.

1 Introduction

Deep neural networks (DNN) have become state-of-the-arts in a variety of machine learning tasks, including control of a physical system in the reinforcement learning setting. For example, in guided policy search [1], a neural network policy is used to replace the online model predictive control policy to directly control a physical system in a feedback loop. Yet, recent studies demonstrate that neural networks are surprisingly fragile - the neural network policies [2], neural network classifiers [3] and many other tasks [4, 5, 6] are all vulnerable to adversarial examples and attacks. For applications that are safety-critical, such as self-driving cars, the existence of adversarial examples in neural networks has raised severe and unprecedented concerns due to recent popularity on deploying neural network based policies in various physical systems. A small perturbation on the input of the neural network may cause significant change in control actions, which could possibly destabilize the system or drive the system state to an unsafe region. Therefore, it is crucial to develop a verification tool that can provide useful certificate (such as safety or robustness guarantees) for a dynamical system with a neural network policy in the loop.

Recently, much research effort has been devoted to developing verification methods to quantify the robustness of neural networks against adversarial input perturbations. The key idea is to characterize the set of neural network output when the adversarial perturbations are constrained in a given set, usually a weighted ball. Current mainstream verification methods can be categorized to exact verifiers [7] and inexact verifiers [8, 9, 10, 11, 12, 13] based on the characterization of output reachable set being exact or over-approximated. Exact verifiers are hard to scale to large and deep neural networks due to its intrinsic NP-completeness [7]; hence often inexact verifiers are more favorable in real neural network applications due to their scalability and capability to deliver non-trivial robustness certificate efficiently. Nevertheless, all the above verifiers are developed in a static setting, specifically for neural network classifiers, where the neural networks do not interact with a dynamical system. As neural network policies are deployed in many real-world systems [1, 14], it is necessary to extend the static neural network certification tool to a dynamic setting, so one can certify a neural network policy in a feedback control loop.

In this paper, we propose a novel framework to verify neural network policies in a closed loop system by combining the aforementioned static neural network certification tools with robust control theory. We consider a realistic and strong threat model where the adversaries can manipulate observations and states at every time step over an infinite horizon (hence the so-called persistent perturbations). Our key idea is to leverage the static neural network certification tools to give a tight input-output characterization of the neural network policy, that when combine with robust control theory, allow us to find a positively invariant set of the closed loop dynamical system. Our contributions are summarized as below.

Contributions.

• To our best knowledge, our work is the first one to extend the neural network certification tools [7, 8, 9, 10, 11, 12, 13] to a dynamic setting, in which we certify a neural network policy in a feedback control loop under persistent adversarial perturbation.

• Our framework can handle non-Lipschitz and discontinuous neural network policies, while the traditional Lipschitz-based robust control approach cannot. Even when the neural network policy is Lipschitz continuous, we prove theoretically and validate experimentally that our proposed framework is always better than the traditional Lipschitz-based robust control approach in terms of tighter certification bound.

• We demonstrate that our method works well on situations where the system dynamic is unknown, unstable, and nonlinear. As long as one can learn the model and over-approximate the modeling error using a data driven approach, our certification algorithm can be applied.

• Along with the certification algorithm, we develop an effective persistent attack algorithm that can successfully discover the vulnerability of a neural network control system and show that such vulnerability cannot be found by exhaustive Monte-Carlo simulation. This observation indicates that the robustness of a neural network control system cannot be certified via exhaustive simulation, and the mathematical-based certification framework developed in this paper is necessary.

Outline. We organize the paper as follows. We summarize related works and notations by the end of this section, and then describe problem setting and formulations in Section 2. We present our main theorems and algorithms in Sections 3 and 4, where we propose an algorithm to find a certificate to ensure that the closed loop state and control constraints are satisfied when the persistent adversarial attack signal is norm bounded. In Section 5, we conduct a comprehensive case study and show that our proposed algorithms not only deliver much stronger certificates than traditional robust control approaches but also work well on both learned models and non-Lipschitz neural network policies.

Related works. Our work is closely related to the neural network certification tools mentioned above. Specifically, given a neural network policy , these tools can certify that U for all for some sets U and Y. We extend these tools to a dynamic setting where the neural network output u can affect the neural network input y in the future via a feedback loop. Another closely related field is safe reinforcement learning [15, 16], where the safety during policy exploration and deployment are investigated. Recent works in [17, 18, 19, 20, 21] can further give a certificate of stability or safety properties of a neural network control system. Our work differs from the above works in two aspects. First, we explicitly incorporate the adversarial perturbation in our formulation, especially the persistent perturbation sequence with bounded norm (the norm and the energy of the signal are unbounded). The Lyapunov based method [18] and the hybrid system method [21] do not deal with perturbation and the integral quadratic constraint based method [19, 20] is developed with bounded norm on the perturbation. The second difference is that our method does not require Lipschitz continuity on the neural network policy, a common assumption in most of the literature. Therefore, our method can be applied on neural networks that are discontinuous or non-differentiable in nature – due to quantization, digitalization, switching logic, obfuscated gradient [22], or other defense strategies. Even when the neural network policy is Lipschitz continuous, we prove theoretically and validate experimentally that our method can give a tighter bound than the Lipschitz-based robust control method. Another relevant work is the reachability analysis of a neural network control system over a finite time horizon [23]. In this paper, we derive safety guarantee over an infinite time horizon with tighter bounds.

Notations. We use lower case letters such as x to denote vectors and upper case letters such as A to denote matrices. We use to denote that x is element-wisely less than or equal to y. For a square matrix to denote its spectral radius, which is the largest absolute value of the eigenvalues of A. We use boldface letters such as to denote signals and transfer matrices in the frequency domain, respectively. Consider a time series signal , the unilateral z-transform of the time series is given by to denote the t-th spectral component of a transfer matrix , i.e., . We use x and as shorthand of x(z) and when the context makes it clear. For a frequency domain equation , the corresponding time domain equation is given by the convolution formula We use to denote the standard norm of the signal x and use and to denote the ] of the system . We define the absolute operator absa stable transfer matrix . Note that each element of the transfer matrix is absolutely summable if and only if the transfer matrix is real rational and stable (see Page 113 -114 of [26]).

2 Problem Formulation

Figure 1: A neural network policy interconnected with an uncertain dynamical system.

We consider a neural network policy interconnected with a dynamical system. The model architecture is shown in Figure 1, where is the neural network control policy, P is the plant to be controlled, and characterizes the uncertainty of the model P. When the model dynamics is unknown in the first place, one can use data driven approaches such as [27, 28] to learn the nominal model P and over-approximate the modeling error using . We assume that P is a discrete time linear time invariant (LTI) system with dynamics given by

where t denotes the time index, x the state vector, u the control action, y the measurement, w the external perturbation, and and the input and output of the uncertainty block . In particular, w is a persistent perturbation over an infinite horizon . In the rest of the paper, we assume zero initial condition unless stated otherwise:

This is to simplify the presentation – the method proposed in this paper can be readily extended to handle persistent perturbation with nonzero initial condition as well. We assume the pair (A, B) is

stabilizable and (A, C) is detectable. The neural network policy is described by

which is assumed to be a static policy in this paper. The uncertainty block can be dynamic, with

in the frequency domain. We assume that is stable and norm bounded. Equations (1) - (6) give a complete description of the model architecture shown in Figure 1.

Remark. If the input to the neural network policy is a finite horizon of historical measurement, i.e., for a finite length T, then we can augment the state x and measurement y for T steps and convert to a static policy.

Remark. Note that y in (2) is not a direct function of is not a direct function of . This assumption ensures that the feedback structure in Figure 1 is well-posed because there is no algebraic loop in the equation.

In this paper, we use w to model the persistent adversarial perturbation. We assume that w lies in the set , or equivalently,

The matrices and in (1) - (2) control the impact of w on the state x and the neural network input y. We note that the block can be used to model another type of adversarial perturbation, such as changing the physical parameters of the model. The goal of this paper is the following:

Goal. Given the model equations (1) - (6), design an algorithm to certify that the following requirement is satisfied for any adversarial sequence

3 Closed Loop Boundedness

In this section, we derive a sufficient condition to ensure that the state x, measurement y, and control u in Figure 1 are bounded for any adversarial attack w lies within the ball (7). In Section 3.1, we assume that the plant model (1) - (3) is open loop stable, i.e., , and directly apply traditional robust control theory to obtain Lemma 1 for closed loop stability. We then explain the restrictions of Lemma 1 and improve the results by combining robust control theory with neural network certification tools to obtain Theorem 1 in Section 3.2. Finally, we extend our results to unstable plants in Section 3.3.

3.1 Stable Plant: robust control baseline

With the zero initial condition assumption, we can use the frequency domain notation introduced before to rewrite (1) - (3) as

with ,

To show the input-output stability1 of the structure in Figure 1, we first assume that the neural network policy is locally Lipschitz continuous with a finite to gain , i.e., over some range . For a stable transfer matrix, the to induced norm is known as the system norm [24], which is defined by

where is the (i, j)-th entry of the transfer matrix G and n is the number of column of G. The following Lemma gives the local input-output stability of Figure 1, which directly comes from robust control theory:

Lemma 1. Consider a stable LTI plant (1) - (4) interconnected with a neural network policy (5) and a dynamic uncertainty block (6) as shown in Figure 1. Assume that the persistent perturbation w lies in the set given by (7). Suppose that the neural network policy has a finite to gain for all , and the uncertainty block has the property . If the following three conditions hold:

ββ

then the closed loop system in Figure 1 is input-output stable over the region for all adversarial attack w satisfying (7).

The proof of Lemma 1 can be found in Appendix A. To certify the local input-output stabilit of the closed loop system using Lemma 1, we can iteratively search for until the local Lipschitz continuous assumption and (10a) - (10c) are both satisfied, or the state constraint is violated. Note that when is the global to bound of the neural network policy, we can drop the condition (10c) because can be arbitrarily large. The global version of Lemma 1, i.e., with being arbitrarily large, can be found in the robust control literature in [29]. When the magnitude of the signals w and y is characterized by the norm, we can replace the norm in Lemma 1 by the norm, as the norm is equivalent to the to induced norm. We include the global and version of Lemma 1 as a corollary in Appendix A, which is an unstructured version of the main loop theorem2 [30, 25] in the robust control literature (control and structured singular value).

3.2 Stable Plant: improvement

Lemma 1 has several restrictions. First, the neural network policy needs to be Lipschitz continuous, and thus Lemma 1 cannot be applied on non-differentiable or discontinuous policies due to quantization or other issues. Second, even when the given neural network policy is Lipschitz continuous, the bound of the local Lipschitz constant for a deep neural network policy is usually very loose. As a consequence, Lemma 1 can only be applied to certify the robustness of Figure 1 over a small region near the stable equilibrium. Finally and most importantly, even if the given policy is Lipschitz continuous and the local Lipschitz constant to characterize the input-output relation of a given neural network is usually very loose. In this subsection, we improve the results of Lemma 1 and propose a more useful Theorem to certify the boundedness of the structure in Figure 1.

Our strategy is to use the static neural network certification tool to give a tighter characterization of the input-output relation of the given neural network policy. The following Theorem gives a sufficient condition to ensure closed loop boundedness under any adversarial attack satisfying (7).

Theorem 1. Consider a stable LTI plant (1) - (4) interconnected with a neural network policy (5) and a dynamic uncertainty block (6) as shown in Figure 1. Assume that the persistent perturbation w lies in the set given by (7). If we can find a quadruplet satisfying the following conditions:

1. Neural network policy robustness certificate: The static policy has the property

2. Input-output relation of the uncertainty block: has the property ,

3. Feedback condition: abs(+ abs(+ abs() and abs(+ abs(

then we have the following properties:

1. Bounded feedback signals:

2. Bounded state:

The key idea of Theorem 1 is to combine a static neural network certification algorithm [8, 9, 10, 11, 12, 13] with robust control theory to certify a neural network policy in a feedback control loop. The first condition of Theorem 1 is an input-output characterization of the neural network policy, which can be obtained by a static neural network certification algorithm. The second condition is a characterization of the model uncertainty block . The third condition, when combining with the first two conditions, ensures that is a positively invariant set of the closed loop dynamical system. The complete proof of Theorem 1 is in Appendix A. Theorem 1 can be used as follows: if we have , then the requirement (8) is satisfied. We will discuss how to find a quadruplet satisfying the conditions of Theorem 1 in Section 4.

Theorem 1 has several advantages over Lemma 1. First, Theorem 1 does not require the differentiability or continuity of the neural network policy . Theorem 1 is valid as long as the property for all can be certified. This is one of the key difference between our approach and the existing literature [17, 18, 19, 20], where the results are obtained based on the assumption of Lipschitz continuity. Second, the neural network certification tool can give a tighter input-output characterization of the neural network policy than the local Lipschitz constant. As a result, the conditions of Theorem 1 are less restricted and easier to satisfy. Indeed, the following Theorem (proof in Appendix A) claims that the conditions of Lemma 1 implies that the sufficient conditions of Theorem 1 will always hold. This means that Theorem 1 can be applied on a strictly larger class of problems than that of Lemma 1.

Theorem 2. The conditions of Lemma 1 imply the conditions of Theorem 1.

Note that Theorem 1 only certifies the boundedness of the closed loop system, not the stability of the closed loop system – Theorem 1 does not guarantee that x = 0 is a stable equilibrium. However, in the presence of persistent adversarial perturbation, we argue that there is no significant difference between boundedness and stability because both of them will have a finite yet nonzero state deviation. In our case study in Section 5, we show that Theorem 1 is much more useful than the traditional robust control approach (Lemma 1) as it can certify the requirement (8) with persistent adversarial attack (7) that is 5 times larger.

3.3 Unstable Plant

In this subsection, we consider the case where the plant (1) - (3) is unstable. Our strategy is to extract a first order approximation of the neural network policy to stabilize the plant first, then analyze the interconnection of the stabilized plant and the residual control policy. Specifically, we rewrite the neural network policy as for some matrix residual control policy. Equations (1) - (3) then become

with the neural network policy . As long as the spectral radius of the closed loop system matrix is less than 1, the transfer matrix is stable. Theorem 1 can then be used with redefined transfer matrices: for instance, we have , , and . Other transfer matrices can be derived in a similar manner. In the following, we propose two different ways to obtain a candidate : (1) using the Jacobian evaluated at the origin, and (2) using neural network certification tool to find a first order approximation of the policy over a region.

If the neural network policy has the property and is differentiable at y = 0, we can use the Jacobian of evaluated at y = 0 as a candidate for , i.e., . In this case, the

residual control policy has the following property:

We then have the following Lemma adopt from Theorem 4.3 in [31] (uncertainty is ignored).

Lemma 2 (Local stability). Consider the dynamical system (11a) - (11b) with the residual control policy satisfying (12). If , then x = 0 is a locally asymptotically stable equilibrium point of the system (11a) - (11c). On the other hand, if is a locally unstable equilibrium point of the system (11a) - (11c).

If the neural network policy is not differentiable or the equilibrium x = 0 is locally unstable according to Lemma 2, one can leverage the neural network certification tools to find a different candidate For instance, the method proposed in [9] provide a pair of linear lower and upper bounds on the neural network output as

A candidate is given by , which is a first order approximation of the control policy over the region . As long as we can find a to make , Theorem 1 is a valid sufficient condition to verify the requirement (8) under persistent adversarial perturbation (7). This statement holds even when a stable equilibrium does not exist.

4 Algorithms

In this section, we propose an iterative algorithm (Algorithm 1) to find a quadruplet satisfies the conditions of Theorem 1. We then propose a simple and effective attack strategy for a neural network control system.

4.1 Algorithm for Theorem 1

For an unstable plant, we assume that a matrix with is given as a default linear approximation of the neural network policy3. We assume that the uncertainty block is described by for a non-negative matrix . We explain why this form of uncertainty is natural when the model and uncertainty are learned using a data driven approach in [27] in Appendix B. The following algorithm finds a quadruplet satisfying the conditions of Theorem 1.

Algorithm 1 can be interpreted as follows. We first extract a linear policy K to make the closed loop transfer matrices stable and calculate the bounds for and over the region using any neural network certification tool. Meanwhile, we calculate the bound for based on the assumption of the uncertainty block . These two steps ensure that the quadruplet satisfies the first two conditions of Theorem 1. Given the range of the adversarial perturbation residual control action , and the uncertainty-induced input , we calculate the bounds for the state x, measurement . If we have satisfies the third condition of Theorem 1, thus we obtain a certificate for closed loop boundedness. If , we then increase the test bound and repeat the search.

4.2 Attack Algorithm

In addition to the certification algorithm, here we propose an algorithm to attack the neural network policy in the control loop. As an example, we show how to design a perturbation sequence with to attack the i-th state . Our idea is to follow Lines 4 - 6 of Algorithm 1 to construct the closed loop transfer matrices with the help of neural network certification tools. We then have

If we ignore the contribution of and , we can maximize using the following bounded perturbation sequence:

for each j. Note that (14) is sub-optimal because we ignore the contribution of . Nevertheless, we show in the next section that our attack (14) is extremely strong.

5 Case Study

In this section, we demonstrate our algorithm on a cart-pole example and show the following results:

(i) Our proposed framework (Theorem 1 and Algorithm 1) outperforms methods based on traditional robust control theory (Lemma 1). Specifically, Algorithm 1 can certify the boundedness of the closed loop system with attack level that is 5 times larger than that of a robust control approach. See the result in Figure 2a.

(ii) Our proposed attack algorithm (14) is far more effective than an exhuastive Monte-Carlo attack. In particular, our model-based attack algorithm can successfully discover the internal vulnerability of the closed loop system while an exhaustive Monte-Carlo simulation cannot. See the result in Figures 2b, 3a, and 3b.

(iii) We show that Algorithm 1 works well on situations where the system dynamics is unknown, unstable, and nonlinear. We use the technique in [27] to learn a nominal plant model P with conservative over-approximation of the modeling error , then use Algorithm 1 to certify the robustness of the neural network policy interconnected with the learned model. The result is close to that with the knowledge of the true model. See the result in Figure 4a.

(iv) We show that Algorithm 1 can be applied on a discontinuous and non-Lipschitz neural network policy, where the control action is quantized into discrete levels. Lipschitz-based methods [17, 18, 19, 20] cannot be used to certify the boundedness of a closed loop system in this case. See the result in Figure 4b.

Experiment setup. We use proximal policy optimization [33] in stable baselines [34] to train a 3-layer neural network policy for the cart-pole problem in Open-AI gym [35]. Our neural network has 16 neurons per hidden layer, ReLU activations, and continuous control output. We note that many trained neural network policies can obtain perfect reward in the training environment over a finite horizon, but will eventually become unstable if we simulate the closed loop dynamics over a longer time horizon. To obtain a more stable policy, we modify the reward function to be

Figure 2: Experiment I: Compare our methods with traditional approaches. (a): The safe region certified by our Algorithm 1 (area below the blue curve) is larger than the safe region certified by the traditional robust control approach (Lemma 1, area below the red curve). (b): The unsafe region discovered by our attack algorithm (14) (area above the black curve) is much larger than the unsafe region discovered by exhaustive Monte-Carlo simulation (area above the brown curve).

which will attempt to minimize the quadratic cost . The policy is trained with 2M steps. We use the neural network certification framework recently developed in [9, 12, 36] but any other neural network certification tools such as [7, 10, 11] can be used in Algorithm 1 as well. The stronger the neural network certification algorithm, the stronger the robustness certificate delivered by Algorithm 1.

For the ease of illustrating the result, we consider an one dimensional persistent perturbation on the pole angle measurement of the cart-pole. The requirement is to certify that a single state (angle of the pole) is within the user-specified limit. Note that Algorithm 1 can be applied to systems with multi-dimensional perturbations with user-specified requirement on both state, measurement, and control action. In the following experiments, we use attack level to represent the norm of the perturbation on pole angle measurement, and use state deviation limit to represent the limit on the norm of the pole angle state. Both the attack level and the state deviation are normalized with respect to 0.014 degree, which is the maximum attack level that can be certified by the Lipschitz-based robust control approach (Lemma 1). The complete model equations are in Appendix B.

5.1 Experiment I: Compare with traditional robust control certification and Monte-Carlo based attacks

Tightness of Algorithm 1. In this experiment, we use the linearized cart-pole model with no uncertainty () to compare the tightness of Algorithm 1 (Theorem 1) and the Lipschitzbased robust control baseline (Lemma 1). Given a user-specified state deviation limit , we use binary search to call Algorithm 1 repeatedly to obtain the largest possible such that the safety requirement is satisfied for any persistent attack . We plot as a function of as the blue curve in Figure 2a. Clearly, the area below the blue curve is the safe region certified by Theorem 1. Likewise, the area below the red curve is the safe region certified by the traditional robust control theory (Lemma 1), where the local Lipschitz constant is obtained via a sampling-based approach4. Figure 2a validates our claim in Theorem 2 – the safe region certified by

Figure 3: Comparison between our designed attack with Monte-Carlo based attack. (a): Our designed attack is injected between 100s and 150s. Monte-Carlo based random attack is injected before 100s and after 150s. (b): Our designed attack causes a huge state deviation between 100s and 150s compared to the Monte-Carlo based attack.

Our attack Monte-Carlo attack Monte-Carlo attack Our attack

Our attack Monte-Carlo attack Monte-Carlo attack Our attack

Lemma 1 is always a subset of the safe region certified by Theorem 1. For (the vertical green line in Figure 2a), Algorithm 1 can certify an attack level that is 5.3 times larger than the one using Lemma 1. Note that there is a flat dashed line at attack level = 1 for the robust control approach. This is because when the attack level is greater than 1, the conditions (10a) - (10c) of Lemma 1 no longer hold. In other words, the maximum attack level that can be certified by Lemma 1 is 1 regardless of the state deviation limit.

Note that in this experiment, the local Lipschitz constant used in Lemma 1 is only a lower bound because it is obtained via a sampling-based approach. Therefore, the only reason that can explain the gap between the blue curve and the red curve is that the neural network certification algorithm [9] gives a much tighter input-output characterization of the neural network policy than the Lipschitzbased method (which uses ). Other Lipschitz-based methods also have this limitation [17, 18, 19, 20]. In brief, our Algorithm 1 outperforms the robust control baseline and can achieve up to 5.3 times better robustness certificate.

In Figure 2a, the area above the black curve is the unsafe region where our attack algorithm (14) (with T = 2500) is able to make the state deviation exceed the user-specified limit. When the attack level is small (< 1), we can see from Figure 2a that the black, blue, and red curves overlap. This means that our certification bound is tight in this region. When the attack level increases (or the state deviation limit increases), we start to see a gap between our certification algorithm and our attack algorithm. Future research will attempt to further bridge this gap.

Our attack algorithm. In Figure 2b, we show that our simple attack algorithm (14) is far more effective than an exhaustive Monte-Carlo simulation. Specifically, the area above the black curve is the unsafe region discovered by our attack algorithm (14) (with T = 2500), while the area above the brown curve is the unsafe region discovered by Monte-Carlo simulation with 1 million time steps. There is a huge gap between the unsafe region found by the two methods. This is a strong evidence that the safety of a neural network control system cannot be certified using exhaustive Monte-Carlo simulation. We need to use the certification framework developed in this paper to guarantee the safeness of a neural network control system.

As a concrete example, at attack level 5 (the horizontal red line in Figure 2b), the mean, standard deviation, and maximum of the state deviation in 1 million steps Monte-Carlo simulation are 0, 1.1, and 5.2, respectively. One may conclude that the probability of seeing a state deviation greater than 5.2 is and falsely claim that the area below the brown region is safe. Unfortunately, this statement is not true when the perturbation sequence is adversarial – for the same level of perturbation, the maximum state deviation found by our algorithm is 31.4, which is 29 standard deviation away. We show the perturbation sequence (input) and the state deviation (output) as functions of time in Figures 3a and 3b. In Figure 3a, we inject our designed attack between 100s and 150s (sampling time = 0.02s, and thus the number of time step is T = 2500). We inject Monte-Carlo based random

Figure 4: Experiment II: Algorithm 1 works on learned models and non-Lipschitz neural network policies. (a): Compare Algorithm 1 on various learned models [27] and the true model. (b) Compare certificates on Lipschitz and non-Lipschitz (quantized) neural network policies.

attack with the same attack level before 100s and after 150s. It is clear from Figure 3b that the state deviation is significantly higher when we inject our attack. Specifically, our attack algorithm can excite the resonance of the closed loop system, while an exhaustive Monte-Carlo simulation usually cannot. We should also note that here we only consider an one dimensional perturbation sequence. Our algorithm (14) can be used to design a multi-dimensional perturbation. The gap between our approach and the Monte-Carlo approach will be even larger in the multi-dimensional setting.

In summary, Figures 2a and 2b show that both our certification algorithm and attack algorithm are significantly better than the traditional methods.

5.2 Experiment II: Certificates on learned dynamics and non-Lipschitz policies

Algorithm 1 on learned models. In many reinforcement learning applications, the mathematical model of the system dynamics (1) - (6) is not available in the first place. Therefore, we need to learn the model before performing the certification task. In this experiment, we show that Algorithm 1 works well on learned models – even when the underlying system dynamics is unknown, unstable, and nonlinear.

We use the method proposed in [27] to learn the nominal model P and a conservative estimate of the uncertainty . For each episode, we run the simulation using the black-box nonlinear cart-pole model with random control action for 30 time steps. We run N episodes of simulation to collect the data, then we solve a regression problem to estimate the system matrices for the nominal model P. To get a conservative bound of the modeling error , we synthesize 100 bootstrap samples of P and use the element-wise maximum deviation from the nominal model to over-approximate the model uncertainty . Empirically, there is high probability that the modeling error is bounded by Beside the bootstrap approach, one can also use other algorithms proposed in [27] to get a stronger mathematical guarantee on the bounds of modeling error if certain technical conditions hold. We include a detailed description of our model learning procedure in Appendix B.

In Figure 4a, we show the certification result of Algorithm 1 on models learned with different number of episodes N. As N increases, the 100 bootstrap samples become more consistent and thus the size of the modeling error shrinks. For N = 100, we can see from Figure 4a that the safe region certified by Algorithm 1 is very close to that with the true linearized model with no uncertainty. This experiment shows that our method indeed works well even when the model of the system dynamics is unknown in the first place. Given a nonlinear and unknown plant model interconnected with a neural network control policy, as long as we can find a nominal LTI plant P and bound the modeling error by , we can use Algorithm 1 to certify the boundedness of the closed loop system under persistent adversarial perturbation.

Algorithm 1 on non-Lipschitz neural network policies. In Figure 4b, we show that Algorithm 1 can be applied on a non-Lipschitz and discontinuous neural network policy. Specifically, we assume that the output of the neural network is quantized into discrete levels. This is a common setting in many reinforcement learning tasks, in which the control action is chosen from a discrete set. When the neural network output is quantized into discrete levels, the neural network policy becomes discontinuous and the closed loop system may not have a stable equilibrium. Therefore, the Lipschitz-based and the stability based methods [17, 18, 19, 20] cannot be used to certify the boundedness of the closed loop system. On the other hand, our Algorithm 1 can still be applied in this case – the area below the light blue curve in Figure 4b is the safe region certified by Algorithm 1. This is because we use the static neural network certification tools [8, 9, 10, 11, 12, 13] to characterize the input-output relation of a neural network policy, which works even when the policy is not Lipschitz continuous.

6 Conclusions and Future Works

Neural networks have shown superior performance in various control tasks in reinforcement learning [1], yet people have concerns using them in safety critical systems because it is hard to certify its robustness under adversarial attacks. In this paper, we extended the neural network certification tools [7, 8, 9, 10, 11, 12, 13] into a dynamic setting and developed an algorithm to certify the robustness of a neural network policy in a feedback control loop under persistent adversarial attack. We showed both theoretically and empirically that our method outperforms the traditional Lipschitz-based robust control approach and works on situations where the model dynamics is unknown in the first place. We also developed an attack algorithm and showed that it can discover the vulnerability of a neural network control system while an exhaustive Monte-Carlo simulation cannot – this suggested that a mathematical-based certification framework, like the one developed in this paper, is necessary to ensure the safety of a neural network control system. The key idea of our method is combining static neural network certification algorithms with robust control using invariant set principle. This idea is fundamental and can be extended to more general settings in many directions. In the future, we will further tighten the certification bound by exploring more general characterization of the invariant set.

References

[1] T. Zhang, G. Kahn, S. Levine, and P. Abbeel, “Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search,” in 2016 IEEE international conference on robotics and automation (ICRA), pp. 528–535, IEEE, 2016.

[2] S. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel, “Adversarial attacks on neural network policies,” arXiv preprint arXiv:1702.02284, 2017.

[3] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.

[4] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille, “Adversarial examples for semantic segmentation and object detection,” in ICCV, https://arxiv.org/pdf/1703.08603, 2017.

[5] R. Jia and P. Liang, “Adversarial examples for evaluating reading comprehension systems,” in Empirical Methods in Natural Language Processing (EMNLP), Outstanding paper award, https://arxiv.org/pdf/1707.07328, 2017.

[6] M. M. Cisse, Y. Adi, N. Neverova, and J. Keshet, “Houdini: Fooling deep structured visual and speech recognition models with adversarial examples,” in Advances in Neural Information Processing Systems, pp. 6980–6990, https://arxiv.org/pdf/1707.05373, 2017.

[7] G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer, “Reluplex: An efficient smt solver for verifying deep neural networks,” in International Conference on Computer Aided Verification, pp. 97–117, Springer, 2017.

[8] J. Z. Kolter and E. Wong, “Provable defenses against adversarial examples via the convex outer adversarial polytope,” ICML, 2018.

[9] T.-W. Weng, H. Zhang, H. Chen, Z. Song, C.-J. Hsieh, D. Boning, I. S. Dhillon, and L. Daniel, “Towards fast computation of certified robustness for relu networks,” ICML, 2018.

[10] T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaudhuri, and M. Vechev, “Ai2: Safety and robustness certification of neural networks with abstract interpretation,” in IEEE Symposium on Security and Privacy (SP), vol. 00, pp. 948–963, 2018.

[11] K. Dvijotham, R. Stanforth, S. Gowal, T. Mann, and P. Kohli, “A dual approach to scalable verification of deep networks,” UAI, 2018.

[12] A. Boopathy, T.-W. Weng, P.-Y. Chen, S. Liu, and L. Daniel, “Cnn-cert: An efficient framework for certifying robustness of convolutional neural networks,” in AAAI, Jan 2019.

[13] V. R. Royo, R. Calandra, D. M. Stipanovic, and C. Tomlin, “Fast neural network verification via shadow prices,” arXiv preprint arXiv:1902.07247, 2019.

[14] B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE transactions on neural networks and learning systems, vol. 29, no. 6, pp. 2042–2062, 2017.

[15] J. Garcıa and F. Fernández, “A comprehensive survey on safe reinforcement learning,” Journal of Machine Learning Research, vol. 16, no. 1, pp. 1437–1480, 2015.

[16] Y. Chow, O. Nachum, E. Duenez-Guzman, and M. Ghavamzadeh, “A lyapunov-based approach to safe reinforcement learning,” in Advances in Neural Information Processing Systems, pp. 8092–8101, 2018.

[17] F. Berkenkamp, M. Turchetta, A. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” in Advances in neural information processing systems, pp. 908–918, 2017.

[18] S. M. Richards, F. Berkenkamp, and A. Krause, “The lyapunov neural network: Adaptive stability certification for safe learning of dynamic systems,” arXiv preprint arXiv:1808.00924, 2018.

[19] M. Jin and J. Lavaei, “Stability-certified reinforcement learning: A control-theoretic perspective,” arXiv preprint arXiv:1810.11505, 2018.

[20] M. Jin and J. Lavaei, “Control-theoretic analysis of smoothness for stability-certified reinforcement learning,” in 2018 IEEE Conference on Decision and Control (CDC), pp. 6840–6847, IEEE, 2018.

[21] R. Ivanov, J. Weimer, R. Alur, G. J. Pappas, and I. Lee, “Verisig: verifying safety properties of hybrid systems with neural network controllers,” in Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pp. 169–178, ACM, 2019.

[22] A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,” ICML, 2018.

[23] W. Xiang and T. T. Johnson, “Reachability analysis and safety verification for neural network control systems,” arXiv preprint arXiv:1805.09944, 2018.

[24] M. A. Dahleh and J. B. Pearson, “-optimal feedback controllers for mimo discrete-time systems,” IEEE Transactions on Automatic Control, vol. 32, no. 4, pp. 314–322, 1987.

[25] K. Zhou, J. C. Doyle, and K. Glover, Robust and optimal control, vol. 40. Prentice hall New Jersey, 1996.

[26] A. V. Oppenheim, A. S. Willsky, and S. H. Nawab, Signals & systems. Prentice-Hall, Inc., 1996.

[27] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample complexity of the linear quadratic regulator,” arXiv preprint arXiv:1710.01688, 2017.

[28] B. Recht, “A tour of reinforcement learning: The view from continuous control,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, pp. 253–279, 2019.

[29] M. Khammash and J. Pearson, “Performance robustness of discrete-time systems with structured uncertainty,” IEEE Transactions on Automatic Control, vol. 36, no. 4, pp. 398–412, 1991.

[30] A. Packard and J. Doyle, “The complex structured singular value,” Automatica, vol. 29, no. 1, pp. 71–109, 1993.

[31] K. J. Aström and R. M. Murray, Feedback systems: an introduction for scientists and engineers. Princeton university press, 2010.

[32] V. Blondel and J. N. Tsitsiklis, “Np-hardness of some linear control design problems,” SIAM Journal on Control and Optimization, vol. 35, no. 6, pp. 2118–2127, 1997.

[33] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.

[34] A. Hill, A. Raffin, M. Ernestus, A. Gleave, R. Traore, P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, and Y. Wu, “Stable baselines.” https://github.com/hill-a/stable-baselines, 2018.

[35] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” 2016.

[36] H. Zhang, T.-W. Weng, P.-Y. Chen, C.-J. Hsieh, and L. Daniel, “Efficient neural network robustness certification with general activation functions,” in NIPS, dec 2018.

[37] M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. J. Pappas, “Efficient and accurate estimation of lipschitz constants for deep neural networks,” arXiv preprint arXiv:1906.04893, 2019.

[38] H. K. Khalil, Nonlinear systems, vol. 3. Prentice hall Upper Saddle River, NJ, 2002.

Appendix A: proof of Theorems

Here we give the proofs for Lemma 1 and Theorems 1 and 2. We also introduce Corollary 1, which is the global version of Lemma 1.

Proof of Lemma 1

Proof of Lemma 1. We first show that (10a) - (10b) is a sufficient condition for global stability when is the global Lipschitz constant. From the assumption, we have

From condition (10a), we can bound the

Then, we have

From condition (10b), we can bound the

This shows that the to gain from w to y is bounded. We can use similar procedure to show that the to gain from w to are all bounded. This shows the input-output stability of the closed loop system when the global Lipschitz constant of the neural network policy is

Next, we consider the case where is valid only over a local region know that the right-hand-side of (18) is less than or equal to given . Thus, y will never go outside the local region where we calculate the Lipschitz constant valid perturbation (7). This completes the proof.

The global version of Lemma 1

The following Corollary is the global version of Lemma 1.

Corollary 1. Consider a stable LTI plant (1) - (4) interconnected with a neural network policy (5) and a dynamic uncertainty block (6) as shown in Figure 1. Assume that the persistent perturbation w lies in the set given by (7). Suppose that the neural network policy has a finite to gain for all y, and the uncertainty block has the property . If the following conditions hold:

then the closed loop system in Figure 1 is input-output stable.

Proof of Theorem 1

The proof of Theorem 1 relies on the concept of a positively invariant set, which is defined as follows: Definition 1 (from [38]). A set M is said to be a positively invariant set with respect to the dynamics

The proof of Theorem 1 is given as follows:

Proof of Theorem 1. We use mathematical induction to show that is a positively invariant set of the closed loop dynamical system if the three conditions of the theorem are given. From the zero initial condition assumption in (4), we have

where the last inequality is from the third condition of the theorem. Then from the first condition of the theorem, we have . Similarly, we have

Then from the second condition of the theorem, we have . This shows . Assume that we have for all . From equation (9b), we have

where the last inequality is from the third condition of the theorem. Similarly, we can derive from (9c). The first two conditions of Theorem 1 then imply and , and thus we have . Using mathematical induction, we conclude that for all and the closed loop feedback signals and state are bounded within the set specified by the theorem.

Proof of Theorem 2

Proof of Theorem 2. We need to show that when (10a) - (10c) and the locally Lipschitz continuous assumption of Lemma 1 are satisfied, we can always construct a quadruplet satisfying the three conditions of Theorem 1. Consider the following equation:

with scalar variables and . For any , we have and because all the elements in (22) are positive according to the conditions (10a) - (10b). In addition, we can show that is equivalent to the left-hand-side of (10c). Therefore, we have from (10c) – this means that is always contained within the region where we calculate the local Lipschitz constant of the neural network policy . It is then straightforward to verify that , satisfy the first two conditions of Theorem 1 given the locally Lipschitz continuous assumption of Lemma 1.

For the third condition of Theorem 1, it can be verified that (22) is a solution to the following inequality:

Finally, we note the following inequality

from the fact that the norm is selecting the maximum row sum. Therefore, we have

We can see that if the conditions of Lemma 1 hold, we can always construct a quadruplet satisfying all the three conditions of Theorem 1. The converse is not true. Therefore, Theorem 1 can be applied on a strictly larger class of problems than Lemma 1.

Appendix B: cart pole model

We consider a cart-pole problem with being the displacement of the cart and the angle of the pole. The dynamics is given by

We use the default model parameters from stable baselines: g = 9.8, M = 1, m = 0.1, and l = 0.5. Using Euler discretization, the nonlinear discrete time cart pole model is given by

with sampling time

Linearized model for Experiment I (Section 5.1)

Define the state vector . The linearized cart-pole model around the origin is given by

Note from that we have an one dimensional perturbation on the pole angle measurement. The requirement is to certify that the actual pole angle is within the user-specified limit, that is,

As explained in Section 5.1, these assumptions are made for the ease of illustrating the result. Our method can work on much more general model with the form defined in Section 2.

Learned model for Experiment II (Section 5.2)

When the model equation is unknown in the first place, we can collect the data from the simulator and fit the data to a model. Let be the state vector x at time t for the i-th episode from the j-th bootstrap sample, for j = 0, 1, . . . , 100. For the j-th bootstrap run, we solve the following least square problem to obtain the system matrices

with T = 30, x from the simulator, and u randomly generated. We then find a pair of non-negative matrices

to over-approximate the modeling error of the nominal model . The learned model used by our experiment in Section 5.2 is then given by

where is the non-negative matrix used in Algorithm 1.

designed for accessibility and to further open science