Technology development efforts in autonomy and cyber-defense have been evolving independently of each other, over the past decade. In this paper, we report our ongoing effort to integrate these two presently distinct areas into a single framework. To this end, we propose the two-player partially observable stochastic game formalism to capture both high-level autonomous mission planning under uncertainty and adversarial decision making subject to imperfect information. We show that synthesizing sub-optimal strategies for such games is possible under finite-memory assumptions for both the autonomous decision maker and the cyber-adversary. We then describe an experimental testbed to evaluate the efficacy of the proposed framework.
The growing ubiquity of autonomous systems, their use in ever more remote and unknown environments, and the increasing sophistication of cyber threats are driving a need for unprecedented system resilience, coupling robust autonomy with efficient cyber-defense strategies , . Consider the push to develop swarms of smallsats in low Earth orbit. Costeffective operations of such swarms require improved autonomy capabilities, both onboard and on the ground. However, complex autonomous behavior makes such systems susceptible to malicious tampering. Similarly, current unmanned air/ground/underwater systems rely on various signals for communication and localization and are already vulnerable to spoofing attacks. A GPS spoofing attack against such systems could result in malicious GPS coordinates being fed to the vehicle, causing it to be (mis)guided on an adversary’s behest . A resilient autonomous system should be able to detect attacks against itself, diagnose the probable causes, and automatically take corrective actions while ensuring the system’s low/high-level goals and objectives are achieved.
However, a primary challenge to achieving this vision of integrated cyber and physical resilience is that technology development efforts in autonomy and cyber-defense are presently evolving independently of each other. Our work aims to reverse this trend. Our overall goal is to develop and demonstrate resilient autonomy for autonomous agents, by extending existing risk-aware planning and execution capabilities  with a combination of state-of-the-art model-
*The work described in this paper was performed at the California Institute of Technology, and at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (NASA).
tute of Technology, 1200 E. California Blvd., Pasadena, CA 91125. [mrahmadi,ames]@caltech.edu.
Propulsion Laboratory, California Institute of Technology, 4800 Oak Grove Dr, Pasadena, CA 91109. [arun.a.viswanathan, michel.d.ingham,kymie.tan]@jpl.nasa.gov.
Fig. 1. A simplified model of an autonomous system.
based reasoning for situational and self-awareness and active cyber-defense mechanisms.
Current cyber adversaries can study the defender’s behavior, identify security caveats, and modify their actions adaptively . To tackle these security challenges, cyberagents require adversarial decision making under uncertainty. Furthermore, agents cannot directly observe their adversary’s true state and/or intention. Hence, active cyber-defense methods necessitate dealing with partial observations  and imperfect/incomplete information. A game-theoretic framework known as partially observable stochastic games (POSG)  provides a promising mathematical formalism for these capabilities.
In this paper, we report our preliminary methodology based on POSGs to integrate high-level autonomy and adversarial decision making. Our method based on POSGs is aimed at addressing cyber-physical threats caused by active cyber-adversaries, for example, as seen in the Stuxnet attack , wherein the attacker modifies their strategy in reaction to defensive actions. We show that the solution to the POSG can be cast as an optimization problem. Then, we propose an experimental setup to evaluate our technique. In summary, we hope to make the following contributions:
• Novel high-level resilient autonomy in the presence of active cyber-attacks leveraging the POSG framework;
• Demonstration of an integrated ”defense-in-depth” capability for secure autonomy of cyber-physical systems. The rest of the paper is organized as follows: Section II discusses the threat model for a cyber-physical system such as an UAV, an autonomous robot, or a swarms of spacecrafts; Section III discusses our proposed methodology using POSGs; Section IV discusses our experimental evaluation methodology followed by our conclusions and future work in Section V.
In this section, we first describe a model of an autonomous system, followed by a description of adversarial goals and a high-level taxonomy of threats.
Fig. 2. Cyber-Physical Threat Model
Figure 1 shows a simplified model of an autonomous system (agent), containing two subsystems: cyber and physical. The cyber subsystem encapsulates functionality such as command and control logic, operating system, applications and any communications between the cyber components. Cyber components may be located on the agent or be external to the agent. Multi-agent systems may have a centralized cyber subsystem coordinating the agents. The physical subsystem encapsulates entities such as sensors, actuators, physical communication channels, and any other hardware comprising the autonomous system. An attacker would want to gain malicious control, cause damage, or deny service to prevent the autonomous system from achieving its goals. Referring to Figure 2, there are four different kinds of attacks an adversary could use to achieve their goals.
A cyber attack directly targets the components in the cyber subsystem. For example, a denial-of-service attack against the communication network of an autonomous system is an example of a cyber attack.
A physical attack targets the components in the physical subsystem. For example, a ballistic impact is a type of physical attack which could damage physical components of an autonomous system. A physical attack often requires physical proximity to the system.
In a cyber-physical attack, an attacker leverages a cyber vulnerability with the intent to affect the physical subsystem. For example, malicious input injection attacks such as the malicious command or data injection seen in recent car hacks . Cyberphysical attacks are often the most devastating as they can be initiated remotely, and cause serious damage to the physical subsystem.
In a physical-cyber attack, an attacker influences the cyber subsystem by attacking the components in the physical subsystem. For example, an attack on the physical sensors of an autonomous system (say the IMU), may cause inaccurate data to be sent upstream to the cyber components (for example,
incorrect location information), thereby causing incorrect decision-making and response by the cyber component. In our work, we focus on the cyber-physical and physical-cyber kinds of attacks, as these attacks cross boundaries and as such, are often more subtle and difficult to diagnose, and consequently pose significant risk to missions. In addition, existing cyber or physical defenses generally do not protect against these attacks. In the next section, we describe a mathematical formalism considering cyber-physical and physical-cyber attacks.
A strategy for an SG resolves all non-deterministic choices, yielding an induced MC, for which a probability measure over the set of infinite paths is defined by the standard cylinder set construction . These notions are analogous for MDPs.
In our framework, consists of the physical and mission states, e.g. robot(s) location and obstacles, or the autonomous decision maker; whereas, corresponds to the internal states of the cyber-adversary. These states are not directly observable to either player; the players must infer the probability of their opponent being at different states based on the observations received at every step of the game. Thus, we have a POSG as follows (see Figure 3).
Definition 3: A partially observable stochastic game (POSG) is a tuple , with G = the underlying SG of and are
Fig. 3. Three stages of an example POSG. The states of the players need to be estimated based on the observations, and in the case of the attacker , counteracted. The game starts at with an initial observation
finite set of observations for Player 1 and 2, respectively, and ) the observation function for Player 1 (Player 2).
We lift the observation function to paths: For , the associated observation sequence is .
Definition 4 (POSG Strategy): An observation-based strategy for Player i in POSG G is a strategy for Player i in the underlying SG G such that for all with .
Applying the strategy to a POSG G resolves all nondeterminism and partial observability, resulting in the induced Markov chain .
However, since POSGs simply extend POMDPs to multiple players, computing optimal strategies requires infinite memory . To circumvent this difficulty, we represent observation-based strategies with finite memory and we use finite-state strategies (FSSs) (see also FSSs in Delay Games ). If such an FSS has n memory states, we say the memory size for the underlying strategy is n.
Definition 5 (FSS): A finite-state strategy (FSS) for Player i in POSG G is a tuple , where is a finite set of memory states, is the initial memory state, is the action mapping Distr(Act), and is the memory update . The set denotes the set of FSSs with k memory states, called k-FSSs.
At each stage of the game, for each player, from a node n and the observation z in the current state of the POSG, the next action a is chosen from Act(z) randomly as given by . Then, the successor node of the FSS is determined randomly via .
A POSG for Secure Autonomy: With the FSS assumption, the goal is then to maximize the probability of satisfying mission specifications, e.g. reach goal region while avoiding obstacles in the presence of cyber-adversarial activity. Next, we formally define the game objective.
Game Objective: For a POSG G and a mission specifica-tion defined by a temporal logic formula , we consider the probability to satisfy .
The specification is satisfied for a strategy and the POSG G with probability , if the probability or simply if the induced Markov chain by applying strategy satisfies the specification with probability . At this point, we have the following game formulation of secure autonomy problem.
In Problem 1, we look for worst-case resilient strategies such that the probability of satisfying the specifications is maximized. Alternatively, we can search for resilient strategies that maximize the expected value of meeting the specifications in the presence of adversarial activity. Indeed, we can approximate with an expected total cost type constraint . Then, for reachability type formulae such as (eventually reach a goal region represented by the states in T), where . The solution to Problem 1 can be found by solving an optimization problem as follows (see  for the derivation for one-sided POSGs).
For and , we define the cost variables that represent the expected cost of reaching with being the expected cost of reaching to T from the initial state . Let be the discount factor to ensure finite total expected cost. We then have the optimization problem:
The objective in (1) implies the decision maker is minimizing the cost of reaching T from the initial state; whereas, the cyber-adversary is trying the maximize the cost. We assign the expected cost of the states in the target set T to 0 by the constraints in (2). We ensure that the strategies of the decision maker and the cyber-adversary are well-defined with the constraints in (3) and (4). The constraints in (5)–(6) give the computation for the expected cost in the states of the POSG via dynamic programming.
We will develop methods based on heuristics and nonlinear programming to solve the resultant POSGs algorithmically and we will study trade-offs between resilience (cyber side)
Fig. 4. Three robots involved in experimental evaluations at CAST: (left) quadruped, (center) Segway, and (right) Flipper.
and mission goals (physical side). Preliminary work in solving POSGs was carried out in  for the case when only the adversary is subject to partial observation with application to network security. Instead of solving the full game, we used model checking to synthesize a set of strong (sub-optimal) strategies for the adversary and then composed robust defensive strategies.
The efficacy of the developed methods will be evaluated through experiments with three autonomous agents (a Segway, a quadruped, and a Flipper robot) in Caltech’s Center for Autonomous Systems and Technologies (CAST) as depicted in Figure 4. The quadruped and the Flipper robot will be tasked to locate the target and the obstacles, respectively; whereas, the Segway is able to retrieve the target once the quadruped and Flipper explore the area. Flipper is equipped with a 3D LIDAR and a router. The quadruped robot is equipped with a high-resolution camera, an Inertial Measurement Unit (IMU), and a router. The Segway only has wheel odometry, an IMU, and a router. The centralized decision making is carried out through a computer connected to the robots via a wifi network. The sensor signals of each robot are also sent back to the computer via the same network. Our previous experiments in this setting were concerned with safe autonomy enforced by discrete-time barrier functions , i.e., in the absence of cyber-adversaries (watch the experimental demonstration at ). The goal of our next set of experiments is to find and retrieve the target in the presence of cyber adversarial activity. This experimental setup is described next. The states of the POSG for Player (the decisionmaker) correspond to the locations of each agent, obstacles, and the goal. The actions for Player include moving Left, Right, Up, Down for each agent. The two states of Player (cyber-adversary) are Quadruped, Flipper corresponding to the two surveying agents. The actions of the attacker are to TakeDown or Wait. If TakeDown is chosen at one stage of the game, for example, for the Flipper robot, the robot will not move in the next step and its observation cannot be used for path planning. On the other hand, Wait means no action is taken by the adversary. The objective of Player is then to maximize the probability of retrieving the target and avoiding obstacles; whereas,
the Player attempts to minimize this probability. This POSG fits in the framework of Section II and can be used to assure high-level mission autonomy as well as cyber-resilience. This initial abstract problem formulation will provide a basis for more realistic (high-fidelity) solutions to the real-world problem in future work, e.g., examining real injected cyber-attacks and practical defensive responses.
We described our ongoing research on the fusion of autonomous decision making and active cyber-resilience. We proposed a POSG that can capture high-level mission specifi-cations, uncertainty, partial observation, and adversarial decision making. Although finding optimal strategies for POSGs is undecidable, we discussed finite-memory strategies as computationally tractable alternatives. Finally, we presented an experimental testbed, methodology and a case study to evaluate our secure autonomy techniques in the future.
The authors thank Prof. Richard M. Murray at Caltech and Dr. Nils Jansen at the Radboud University Nijmegen.
 M. Ahmadi, M. Cubuktepe, N. Jansen, S. Junges, J.-P. Katoen, and U. Topcu. The partially observable games we play for cyber deception. In 2019 American Control Conference,, 2019.
 M. Ahmadi, M. Ono, Ingham, R. M. Murray, and Aaron D Ames. Risk-averse planning under uncertainty. In 2020 American Control Conference, 2020.
 M. Ahmadi, R. Sharan, and J. W. Burdick. Stochastic Finite State Control of POMDPs with LTL Specifications. arXiv:2001.07679, Jan 2020.
 M. Ahmadi, A. Singletary, J. W. Burdick, and A. D. Ames. Safe Policy Synthesis in Multi-Agent POMDPs via Discrete-Time Barrier Functions. 58th Conference on Decision and Control, Dec 2019.
 Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. MIT Press, 2008.
 Krishnendu Chatterjee, Martin Chmel´ık, and Mathieu Tracol. What is decidable about partially observable markov decision processes with -regular objectives. Journal of Computer and System Sciences, 82(5):878–911, 2016.
 Gregory Falco. The vacuum of space cyber security. In 2018 AIAA SPACE Forum and Exposition, page 5275, 2018.
 S. M. Giray. Anatomy of unmanned aerial vehicle hijacking with signal spoofing. In 2013 6th International Conference on Recent Advances in Space Technologies (RAST), pages 795–800. IEEE, 2013.
 Andy Greenberg. Hackers remotely kill a jeep on the highway – with me in it. https://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/, July 2015.
 A. Kott. Intelligent autonomous agents are key to cyber defense of the future army networks. The Cyber Defense Review, 3(3):57–70, 2018.
 Akshat Kumar and Shlomo Zilberstein. Dynamic programming approximations for partially observable stochastic games. In TwentySecond International FLAIRS Conference, 2009.
 Ralph Langner. Stuxnet: Dissecting a cyberwarfare weapon. IEEE Security & Privacy, 9(3):49–51, 2011.
 C. McGhan, R. Murray, T. Vaquero, B. Williams, M. Ingham, M. Ono, T. Estlin, R. Lanka, O. Arslan, and M. Elaasar. The resilient spacecraft executive: An architecture for risk-aware operations in uncertain environments. In 2016 AIAA SPACE Forum and Exposition, 2016.
 Colin Tankard. Advanced persistent threats and how to monitor and deter them. Network security, 2011(8):16–19, 2011.
 Sarah Winter and Martin Zimmermann. Finite-state strategies in delay games. Information and Computation, page 104500, 2019.