In this work we consider the design space [1] of behaviour planning—high level decision making—for autonomous driving. To simplify the design process, we decompose the design space into three principal axes of design choices, based on our practical experience [2] and with reference to the current state of the art. Within each axis, we discuss the inevitable qualitative trade-offs that exist and review the relevant literature. We illustrate our decomposition using feature diagrams [3]. In doing this, we identify potentially interesting areas of research within the behaviour planning design space. The motivation of our decomposition is as follows. Human driver control actions are continuous, yet driving also contains discrete episodes, arising from road connectivity, signs, signals, road-user interactions, etc. The vehicle must nevertheless follow a smooth continuous trajectory on the road. The spectrum of possible discrete and continuous abstractions is thus the first principal axes of design space that we consider. Some different representations derived from our autonomous driving stack are illustrated in Fig. 1. A common decomposition of the driving problem allocates high level discrete decisions (go straight, turn left, stop, ...) to a behaviour planner (BP), leaving low level continuous actions to be enacted by a local planner (LP). Choosing discrete actions is well-suited to procedural programs, while continuous actions can be found by optimization. In practice, the vehicle and external environment are not entirely predictable, such that high and low level control actions are dependent on each other, on the actions of other road users, and on the actual behaviour of the vehicle. This leads to the second principal axis we consider, which concerns the overall architecture of the motion planner. We consider the level of integration and communication between the
Fig. 1. Four environment representations used in motion planning for autonomous driving, ordered from least to most abstract.
behaviour planner and local planner, and the way prediction is incorporated, which we consider to have the most influence on BP design.
Our third and final principal design choice axis is the representation of decision logic. In addition to conventional programs and expert systems, recent advances in hardware and software make learned logic a plausible and attractive alternative.
Fig 3 illustrates our three principal axes of design choices in the form of a feature diagram [3]. The figure also contains a key to the visual syntax used elsewhere.
The remainder of the paper is structured as follows. We first define the motion planning problem in Section II. Then, in Section III, we consider the different ways that the motion planning environment can be represented, acknowledging our first axis of decomposition. In Section IV we describe the different possible architectures of the BP, considering our
Fig. 2. General architecture of motion planning. White boxes denote the scope of the current work.
second axis of decomposition. In Section V we outline the ways in which a BP’s decision making can be implemented, which is our third axis of decomposition. We conclude in Section VI.
The autonomous vehicle motion planning problem can be summarized as providing a reference trajectory, in continuous domain, that meets the specified requirements of safety, comfort, progress and energy efficiency, and which can be executed by the vehicle hardware, taking into account the vehicle’s dynamics. The task is typically decomposed into three sub-tasks: (i) mission planning; (ii) behaviour planning; and (iii) local planning. These are illustrated in Fig. 2. Mission planning is the process of making the highest level decisions with regard to the proposed journey, such as deciding the sequence of roads to take, given the desired destination, the current position of the vehicle, the user’s preferences, and a priori assumptions about road conditions and availability. If the mission planner’s assumptions are contradicted along the way, or the user makes ad hoc changes to the requirements of the journey, the mission planner must re-plan. Such decision making and re-planning are standard on current satellite navigation devices. Since mission planning is already a welldiscussed problem, we do not consider this aspect of motion planning in the present work.
Given a proposed sequence of roads to follow, as provided by the mission planner, the BP must devise a sequence of discrete high level control actions to navigate through the environment. Control actions might include such basic manoeuvres as speed up, slow down and come to a stop. The actions must also be consistent with actual road conditions, so must be generated online according to sensor input. Perception is, therefore, a crucial ingredient of behaviour planning with many of its own challenges, including noise, occlusions and sensor fusion. In this work, we acknowledge the challenges of sensing by assuming that the motion planner is presented with a view of the ego vehicle and the external environment that may be incomplete and contain errors. Notwithstanding the problems of imperfect assumptions and sensing, the BP must react to the dynamic nature of the environment.
The objective of local planning is to devise a safe and smooth trajectory from the autonomous vehicle’s current position to a goal position, by avoiding obstacles, satisfying comfort requirements and generally respecting kinodynamic constraints. The abstract actions chosen by the BP must,
Fig. 3. Feature diagram of high-level design choices.
therefore, take into account the actual state of the ego vehicle and environment. The BP may parameterize its choices or otherwise communicate with the LP. The optimal degree of integration between the LP and the BP depends on the algorithms chosen and how the overall task is divided, which will be discussd in Section IV-A.
Design choices for the BP are often constrained by its input. This input is a representation of the driving environment modelled using information from on-board sensors and other sources, such as digital maps and Vehicle-To-Everything (V2X) communication. The continuous spatiotemporal information about the environment is often discretized and abstracted before reaching the BP, to simplify it and pass on only the most relevant details to the processing pipeline [4]. Such abstracted data requires reduced communication bandwidth and is more convenient to work with, aiding high frequency decision making. On the other hand, the simplifying assumptions may be imperfect, and less-abstract data may contain useful additional information. There thus exists a trade-off between the fidelity of decision-making and the computational burden.
The different types of representations are illustrated in the feature diagram in Fig. 4 and discussed below, ordered from least to most abstract. Although we treat them separately in the present context, we note here that a combination of these representations (as shown with the or operator in the figure) may also be used in practice to provide a single coherent model of the environment [5].
A. Raw Data
It is possible for the BP to use the direct output of on-board sensors and other sources (camera, radar, lidar, etc.), with minimal pre-processing. Typical lidar output is illustrated in Fig. 1(a). Such output contains the most information and the most noise. The complexity of making decisions with raw data makes this representation suitable only for end-to-end approaches, where the intermediate steps in the processing pipeline, such as road feature detection and path planning, are performed implicitly, e.g., by learning and encoding them into a neural network (NN). For instance, [6] transforms the raw pixels from a single camera sensor directly into steering commands using a convolutional neural network (CNN). The resulting lack of modularity makes this approach less scalable and opaque to human understanding.
Fig. 4. Environment representation design choices. See Fig. 3 for key.
B. Feature-based
A set of features that encapsulate the current environment can be extracted from the raw sensor data. This set of features can be represented in discrete space [7], continuous space [8], or a combination of both [9]. This representation attempts to simplify the information in the scene, leading to a modular architecture design, while maintaining all the important information. However, errors within the feature representation can arise either due to bad feature selection or poor feature extraction, as this is a complex task.
In the autonomous driving domain, it is common to represent a geometric road network as a continuous feature (see Fig. 1(b)), which aids in the planning problem. An example of a discrete feature is a determination of whether a vehicle is at a stop sign or not. Finally, a bounding box is one of the most common features in autonomous driving, containing both continuous features, such as location and size, as well as discrete features, such as the object classification.
C. Grid-based
The feature-based representation can be further refined such that the continuous features are mapped to discrete space. The act of discretization significantly simplifies the information, allowing it to be conveniently used in traditional programmed logic systems. However, the resulting representation is less detailed, which can potentially lead to errors in decision making.
An occupancy grid is an example of such a grid-based environment representation. A 2D occupancy grid, as depicted in Fig. 1(c), projects the road environment into the plane of the road surface and indicates the probability that a particular cell is occupied by an obstacle. The notion can be extended to 3D, e.g., voxel grids, and also equipped with object class and dynamics information to enable more subtle decision-making [4].
D. Latent Representation
Fig. 1(d) illustrates how raw sensor data can be compressed into a latent variable representation of the environment. While such a representation is typically more efficient, it may not be human-understandable. On the other hand, the compression may capture patterns that are not apparent to a human programmer. Typical encoding methods include Principal Component Analysis (PCA [10]) and Variational AutoEncoders (VAE [11]).
Since the input is not human-readable, the latent representation may be more amenable to data-driven planners. For example, a NN may create a condensed spatial and temporal representation of the environment that models not just the current state but also predicts future states, and a decoder can make this latent representation human-readable [12]. The final latent representation captures all the relevant information in a small vector. Some of these concepts are discussed further in Section V-B.
In this section, we discuss the possible ways in which a behaviour planner (BP) can be integrated with its associated modules, and how those choices influence the overall architecture of the planner. The principal design choices are illustrated in Fig. 5. The first design decision is to choose the right level of integration between BP and LP; the two planners can exist as separate or integrated into a single module. This choice is shown in the figure with an xor operator under Motion Planner Architecture. The second design decision is whether to have a dedicated prediction module for BP (shown in the figure as an optional feature). One factor that influences this decision is the operational environment of the autonomous vehicle. For simple driving situations, where a constant velocity behaviour model for other road users might suffice, the architecture might not need a dedicated prediction module. However, for complex situations, especially in situations of high volume traffic, there might be a need for a dedicated prediction module to share the computation burden. In some designs, instead of prediction being a part of BP, the task can also be split between perception and LP. For example, the perception module can detect additional activity attributes (running, walking, stationary, etc.) as part of pedestrian detection, which LP can further consume to select a behaviour model to predict the trajectory of the road user. Another option is to not have prediction at all and make decisions using only the current world state, although this may not lead to correct decisions.
A. Architecture of the Motion Planner
The integration of the BP within the motion planner is critical since this constrains the other elements and the overall architecture. As illustrated in Fig. 5, there are two mutually exclusive methods of integration of the BP: one in which the BP module is architecturally separated from the rest of the motion planner, and one where the BP may be partially or fully integrated.
1) Separated: Many early approaches to motion planning [13]–[15] have a top-down architecture design, described in the problem definition and illustrated by the positions of motion planner elements in Fig. 2. In this architecture, the behaviour planning process is done on a planned mission path which is to be refined by a local planner (LP). The BP restricts the complexity of the local planning algorithm by providing a single or set of possible high-level manoeuvres at each step of the planning process. This decomposes the planning problem into two tasks, i.e. behaviour planning and local planning, with a clear separation between the two. Although there is an advantage of simplicity to this design,
Fig. 5. Feature diagram of architecture design choices. See Fig. 3 for key.
the choice of complete separation can lead to computational redundancy. For example, to avoid generating manoeuvres that cannot be safely executed by the LP, the BP might need to solve a part of the LP’s trajectory planning problem. Even with the added computation, there can be conflicting solutions since the trajectory planning problem that the BP solves is typically less constrained than the one solved by the LP [13].
Another approach inverts the traditional architecture for planning. This type of BP framework generates multiple possible local paths then selects the best manoeuvre according to a given cost function [16]. This has the advantage of knowing the exact path that will be executed by the vehicle, and thus decisions are made on a more constrained problem, resulting in more accurate decisions. However, this approach results in increased computational redundancy as many paths now have to be considered.
2) Integrated: Recently, there has been a decline in human expert-based systems in favour of learned systems. Such learning based systems attempt to solve the entire motion planning problem [9], [17], and perhaps even the entire autonomous driving problem, in an end-to-end fashion [6], [18], [19]. By encapsulating the motion planning problem in a single pipeline, manoevre selection can be performed implicitly during the planning process. To achieve this, the learned system must rely on rewards or cost functions to inform its behavioral decision-making; this can be done either with [9] or without the use of explicitly defined manoevres [5]. While these approaches show great promise, they rely heavily on large labelled data sets and sophisticated simulation environments, both of which may not be readily available to academic researchers.
B. Prediction Architecture
In order to perform safe behaviour planning, many BP algorithms rely on predicting and reacting to the behaviour of other dynamic objects throughout the entire plan. Given their state information and past trajectories, the prediction task is to predict one or more of the following: the trajectory, the low-level motion primitives (accelerating, decelerating, and maintaining speed, etc.), or the intent (yielding, changing lane, crossing street, etc.).
Environment prediction approaches vary according to the environment representation, the design of the prediction model, the abstraction of the prediction, the degree of incorporation of prior human knowledge, prediction horizon, robustness to noise, etc. [20]–[22]. Despite this heterogeneity, three key categories of approaches stand out: (i) physicsbased models, which predict dynamic object motion solely by physical laws; (i) manoeuvre-based models, which model the intended road-user manoeuvres and predict their execution; and (i) interaction-aware models, which account for the interdependencies among various agents in the environment [23]. Most published approaches are physics- or manoeuvre-based, with attention only recently shifting to interactionbased techniques.
The prediction architecture, which is concerned with different levels of coupling between prediction and behaviour planning, is another key decision in the BP design space (Fig. 5). The first choice is between the prediction models being explicitly or implicitly defined. Explicitly defined prediction models take in state observations and make explicit predictions about the future behaviour of the road users. These models can be either external or internal to the BP.
External prediction models are completely decoupled from the planning process, and their outputs augment the environment representation that is fed to the planner. This design choice offers a clear interface between prediction and planning, which aids modularity. Most prediction approaches fall into this category [20]–[22], [24]–[27].
Internal prediction models, on the other hand, are integrated with the motion planning process and exist within the planner. Examples of this choice are planners that rely on a Partially-Observable Markov Decision Process (POMDP) model, which considers the intention of road users as a latent representation (see Section III-D) within the planning state space [28]–[31]. At each planning step, the planner maintains a belief of road users’ intentions, which is periodically updated based on new observations. Planning over this belief space of road user intentions rather than using fixed predicted intentions from external models has advantages, especially with respect to planning under uncertainty. In particular, it may result in safer trajectories, especially in cases where road users change their intention in response to the subject vehicle’s behaviour [28]. However, this design is currently computationally intractable for situations with a high number of road users.
Instead of having explicitly defined prediction models, a BP architecture may use implicitly defined ones, which results in an even higher degree of coupling between prediction and planning. Implicit prediction models do not represent the intent of road users as explicit features. Instead, the prediction algorithm learns to predict the behaviour of road users simultaneously while learning a driving policy based on the interaction of the subject vehicle with the environment. Most approaches based on reinforcement learning (RL) fall under this category, which we elaborate further in
Fig. 6. Feature diagram of decision logic design choices. See Fig. 3 for key.
Section V-B. The primary advantage of such a design is that it eliminates the need for a separate environment prediction module altogether, which reduces complexity and improves runtime efficiency. However, the human interpretability of implicit prediction models is lacking, which hinders their verification and validation.
Our third principal axis of decomposition is that of the underlying logic representation used to make high-level decisions. We categorize the decision logic of the planner into two paradigms: (i) logic represented through a set of explicitly programmed production rules, and (ii) logic representation relying on mathematical models with their parameters learned a priori. The final high level driving decision is made by exclusively following one of these two paradigms, as indicated by the highest level exclusive-or in Fig. 6.
While learned-logic models are able to generalize over diverse scenarios, they are not human-interpretable [32] and hence, difficult for humans to ensure safety. On the other hand, programmed logic requires extensive human effort and is subject to traditional software engineering principles. As a result, safety can be reasoned through traceability analysis.
A. Programmed Logic
Programmed logic systems have the advantage of interpretability which allows using traditional software engineering methods, such as inspections and walkthroughs to uncover problems in implementation. Unlike learned logic systems, they do not try to generalize over a set of examples and interactions but rather define consistent and verifiable resolution heuristics to arrive at a driving decision. The implementation of such systems can be achieved using two exclusively different programming paradigms as shown in Fig. 6: imperative and declarative.
Imperative systems outline a sequence of operations that expresses the control flow of a given program as it moves from one state to another. Declarative systems, on the other hand, are able to express the underlying logic without specifically describing the control flow. Due to their lack of strict system flow, declarative systems are more resistant to change since the system has fewer inter-dependencies, consequently making them harder to implement. Imperative systems, on the other hand, require a large number of rules and transitions in order to fully handle the complex cases of the driving task. As a result, this paradigm is hard to scale to more complex driving situations.
1) Imperative Systems: In an imperative system, the designers focus on sequences of operations that allow the transition from one state to another. Imperative functions work on concrete data and prepare the information flow required by the next function. The imperative programming paradigm brings about predictability as a result of well defined programmed instructions. Due to this reason, this paradigm is able to produce safety guarantees within a set of domain constraints.
State machines are an example of the imperative paradigm. Such systems exhaustively describe the list of all possible states as well as the transitions between them. One approach to design a state machine is to abstract the possible locations in the environment that can be encountered, and handle them independently [13]. For each abstract location, the environment state causes the system to transition from one state to another, and finally produces a manoeuvre. For example, [33] proposes six process flow diagrams allowing an autonomous vehicle to cross a multi-lane, controlled or uncontrolled intersection. While being able to handle pre-defined driving scenarios, such systems are often less able to deal with noisy environment states.
A framework for composing multiple state machines is proposed in [34]. This work reduces the exponential nature of the problem caused by a large number of transitions between states by using a rule-based system. It internally regulates the transitions to different state machines, each of which is responsible for selecting an appropriate manoeuvre.
In a more experimental context, the authors in [35] develop a state machine based solely on linear temporal logic (LTL) rules and show that a manoeuvre produced by such a system will not violate already existing rules. Their experiences show that this model of interaction between vehicles is sufficient for the ego to navigate without the risk of collision in a fully observable environment. However, this assumption of a fully observable environment is impossible to achieve in systems doing real world autonomous driving.
Although these systems show promise in the domain of safety, the authors do not believe that these approaches can adequately handle the noisy aspects of real world driving.
2) Declarative Systems: Declarative systems can represent the internal logic in the following two manners. (i) Expert systems obtain a decision through the evaluation of a world state over a set of rules and combine them through inferential processes to obtain a final behavioural decision. (ii) Optimization systems encapsulate driving behaviour as a set of mathematical variables which are optimized with respect to a notion of optimal behaviour.
Expert systems are able to derive a series of atomic propositions (AP) by combining them through an inferential process to arrive at a behaviour decision. For example, in [7], the rules of the road are applied to a discretized world state represented by a set of boolean APs, which are then combined hierarchically. A hierarchical representation simplifies the rule premises and thus, improves generality and robustness of such a rule-base. Nevertheless, such severe discretizations yield only sub-optimal behavioural decisions as they rely on imprecise world representations.
To avoid the problems caused by discretization, a continuous world representation could be utilized. One approach to utilize this representation is through a probabilistic rule system. In [36], the authors construct a probabilistic system and use a decision tree structure which operates on a continuous world state to produce a set of likelihood probabilities over all manoeuvres. Alternatively, a continuous world state can be processed using a set of fuzzy rules [37], which can handle noisy environmental states as well as a more complex set of scenarios.
Alternatively, the behavioural selection task can be solved by optimizing over a set of mathematically formulated criteria to achieve a correct behaviour. One approach to this type of optimization problem is the representation of the environment as an MDP. This allows the use of optimization solvers to select the path with the most optimal behaviour. Recent approaches use a POMDP representation model, which allows a world model to be constructed with incomplete world information [38], [39]. To make the solution tractable, the space of possible behaviours is usually discretized and decided in advance. The behaviours for the other agents can then be rolled out over time to predict potential future world states ued to optimize the egos behaviour [40]. MCTS can also be used to compute beliefs about the future world states [41]. POMDPs have also been developed to work over continuous environment representations, which better models reality [42]. Recent advancements have enabled POMDP solvers to operate in real time, allowing deployment on autonomous vehicles [38].
B. Learned Logic
Learned decision making can be categorized on the basis of whether the learning is from expert examples, or from interaction with a simulated environment that closely approximates the real world. There are also some approaches that use expert examples as well as a simulated interactive environment [43]–[46]. In practice, “learning from example” has been shown to be more robust, although it requires extensive labelled driving data [5]. On the other hand, “learning from interaction” allows for the acquisition of knowledge from a more diverse set of driving situations, even potentially dangerous situations.
1) Learning from Example: Simple end-to-end represen-
tational learning has been shown to successfully execute basic manoeuvres on roads and highways [6], [47], [48]. Since the driving task constitutes a large problem space, directly mapping sensor inputs to driving actions in this way needs a lot of data to learn effectively, meaning it is sample-inefficient. Due to the temporal nature of driving data, the authors [49] demonstrated an improvement using deep recurrent NN over standard deep NNs.
Another class of learning algorithms that use examples is learning from demonstrations. A behaviour planning system can use imitation learning to copy the behaviour of a reference driver [50]–[52]. While behaviours learned through imitation learning are typically considered more robust in practice than other kinds of learned logic models, they often cannot adapt to scenarios which were sparsely represented in the training data. Conditional imitation learning [53] and adding noise to the samples [5] can mitigate this. Recent work has established imitation learning as an interesting approach within this field.
It is also possible to use human demonstration examples to construct a reward function that can be used to learn a driving behaviour through RL [54], [55] (see Section V-B2 for RL). This process is called inverse RL (IRL). The reward function must be postulated correctly (linear, non-linear, etc) before learning, otherwise the obtained reward function might not generalize to the driving preferences expressed by the examples. If designed properly, IRL algorithms result in driving behaviours that respect driving preferences specific to the human expert(s) who performed the demonstrations [17]. However, postulating a reward family has proven to be a difficult challenge and is still an open issue.
2) Learning from Interaction: A common way to learn
from interaction is through RL, where an agent receives feedback from the simulated environment in the form of a reward and adjusts its behaviour to maximize the expected long term future reward [56], [57]. The two main ways to encapsulate the ego behaviour are using a set of discrete manoeuvres, or by using continuous control commands (steering angle, brake, and acceleration) [58], [59]. Learning the continuous controls attempts to learn a solution to both the motion planning problem as well as learning the controls of the vehicle.
Learning driving behaviours with RL typically uses NNs, which suffer from “catastrophic forgetting”, meaning they forget important previously learned knowledge when adding new knowledge. Mitigating catastrophic forgetting is an active area of research [60], [61]. The design of the reward function for RL is nontrivial, especially since there are usually multiple objectives, such as collision avoidance, comfort and traffic rule obedience. There are several ways to address this difficulty. The authors of [18] incorporate a dependence on the immediate action in the reward design to improve the learned behaviour. Beyond immediate rewards, it is also possible to conveniently express complex driving objectives with temporal causality using temporal logic [9], [62], [63].
The driving task can be also modelled as a hierarchical RL problem. In this formulation, the lowest level implementation of the hierarchy approximates a solution to the control and LP problem, while the higher level selects a maneuver to be executed by the lower level implementations [16], [64], [65]. One advantage of such a decomposition is that the problem is divided into sub-problems (the lower for LP and the higher for BP) which are subsolvable and have lower individual variance [64]. Empirical strategies like Monte Carlo Tree Search (MCTS) may be used to further improve the task of behaviour selection [9].
Another strategy is to use model-based RL, in which specific characteristics about the environment model are learned. For example, the authors in [66] learn a collision probability model and in combination with RL methods, boost performance. With model-based RL, learning is more sample-efficient [67, Chapter 8].
The state and action spaces can be made fuzzy [68], which has the advantage of reducing the problem complexity and improving robustness. It is also possible to construct a driving behaviour which does not directly use the learned RL behaviour, but rather performs some probabilistic post inference [69]. RL can also be done in a cooperative sense, by trying to maximize the system’s utility rather than just a single (ego) vehicle’s utility [70]; the system refers to the set of vehicles in the environment, all of which are following the learned behaviour.
3) Learning from Example and Interaction: Although
“learning from example” and “learning from interaction” are the two broad categories within “learned logic” design choice, an approach may not be entirely one or the other section. For example, in [64] an initial behaviour is obtained through imitation learning (learning from example), but improvements are made through interaction with a simulated environment (learning from interaction). While the hierarchical portion of this hybrid solution is still manual, the convergence in learning is faster since the initial behaviour is bootstrapped from an expert behaviour, and has to learn over a smaller problem space.
Biologically inspired methods such as Hebbian learning and genetic algorithms could also be combined with RL (learning from interaction) to learn a driving behaviour [43]– [45]. For these methods to be effective, it is imperative to appropriately design the fitness function and spend the requisite time needed for parameter tuning. However, if done correctly, these methods may be able to produce multiple alternative optimal driving strategies.
As discussed in Section V-B1 and V-B2, both “learning from example” and “learning from interaction” have advantages, like robustness and adherence to multiple objectives. If designed correctly then a combination can inherit the advantages of both approaches. However, if not appropriately designed, the resulting approach may be susceptible to the disadvantages highlighted in the previous sections. Nevertheless, the authors cautiously suggest that such an approach could be an interesting future direction of research.
In this work we have systematized many feasible and practical design choices for behaviour planning in the context of autonomous driving. There is an obvious trade-off between the representation of the world and the computational complexity of the motion planning problem, but advances in hardware and software have made learned systems, especially in the field of neural networks, a feasible means to make decisions over continuous space. Without learning, traditional robotic solutions cannot adequately handle complex, dynamic human environments, but ensuring the safety of learned systems remains a significant challenge. Hence, although the present work has implied many crisp design choices, we speculate that future high performance and safe behaviour planning solutions will be hybrid and heterogeneous, incorporating modules consisting of learned systems supervised by programmed logic.
There is currently no benchmark to evaluate the performance of BP technologies, however a quantitative comparison remains important future work.
[1] A. MacLean et al., “Questions, options, and criteria: Elements of design space analysis,” HumanComputer Interaction, vol. 6, no. 3-4, pp. 201–250, 1991.
[2] T. Pender, “Waterloo’s ‘Autonomoose’ hits 100-kilometre milestone,” Waterloo Region Record, August 2018. [Online]. Available: www.therecord.com/news-story/ 8859691-waterloo-s-autonomoose-hits-100-kilometre-milestone/
[3] K. C. Kang et al., “Feature-oriented domain analysis (FODA): feasibility study,” Carnegie Mellon Univ., Software Eng. Inst., Tech. Rep., 1990.
[4] M. Schreier, “Environment representations for automated on-road vehicles,” at - Automatisierungstechnik, vol. 66, no. 2, pp. 107–118, feb 2018.
[5] M. Bansal et al., “Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,” 2018, http://arxiv.org/abs/1812.03079.
[6] M. Bojarski et al., “End to end learning for self-driving cars,” 2016, http://arxiv.org/abs/1604.07316.
[7] N. Zimmerman et al., “Implementing a rule-based system to represent decision criteria for on-road autonomous navigation,” in 2004 AAAI Spring Symp. on Knowledge Representation and Ontologies for Autonomous Systems, 2004.
[8] P. Bender et al., “Lanelets: Efficient map representation for autonomous driving,” in 2014 IEEE Intelligent Vehicles Symp. (IV), June 2014, pp. 420–425.
[9] C. Paxton et al., “Combining neural networks and tree search for task and motion planning in challenging environments,” in 2017 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), September 2017, pp. 6059–6066.
[10] K. P. Murphy, Machine Learning, A Probabilistic Perspective. MIT Press, 2012.
[11] B. Ichter et al., “Learning sampling distributions for robot motion planning,” Sep 2017, http://arxiv.org/abs/1709.05448.
[12] D. Ha and J. Schmidhuber, “World models,” 2018, http://arxiv.org/abs/1803.10122.
[13] C. Urmson et al., “Autonomous driving in urban environments: Boss and the Urban Challenge,” Journal of Field Robotics Special Issue on the 2007 DARPA Urban Challenge, Part I, vol. 25, no. 8, pp. 425–466, June 2008.
[14] A. Bacha et al., “Odin: Team VictorTango’s entry in the DARPA urban challenge,” Journal of field Robotics, vol. 25, no. 8, pp. 467–492, 2008.
[15] C. Urmson et al., “Tartan Racing: A multi-modal approach to the DARPA urban challenge,” Carnegie Mellon Univ., Tech. Rep., 2007. [Online]. Available: http://repository.cmu.edu/robotics/967
[16] J. Wei et al., “A behavioral planning framework for autonomous driving,” in IEEE Intelligent Vehicles Symp. Proc. IEEE, 2014, pp. 458–464.
[17] M. Kuderer et al., “Learning driving styles for autonomous vehicles from demonstration,” in 2015 IEEE Int. Conf. on Robotics and Automation (ICRA), May 2015, pp. 2641–2646.
[18] P. Wolf et al., “Learning how to drive in a real world simulation with deep Q-networks,” in 2017 IEEE Intelligent Vehicles Symp. (IV), June 2017, pp. 244–250.
[19] A. El Sallab et al., “Deep reinforcement learning framework for autonomous driving,” Electronic Imaging, no. 19, pp. 70–76, 2017.
[20] T. Gindele et al., “Learning driver behavior models from traffic observations for decision making and planning,” IEEE Intelligent Transportation Systems Magazine, vol. 7, no. 1, pp. 69–79, 2015.
[21] W. Zhan et al., “A non-conservatively defensive strategy for urban autonomous driving,” in Intelligent Transportation Systems (ITSC), 2016 IEEE 19th Int. Conf. on. IEEE, 2016, pp. 459–464.
[22] A. Sarkar et al., “Trajectory prediction of traffic agents at urban intersections through learned interactions,” in 2017 IEEE 20th Int. Conf. on Intelligent Transportation Systems (ITSC). IEEE, 2017, pp. 1–8.
[23] S. Lef`evre et al., “A survey on motion prediction and risk assessment for intelligent vehicles,” ROBOMECH Journal, vol. 1, no. 1, p. 1, Jul 2014.
[24] E. K¨afer et al., “Recognition of situation classes at road intersections,” in IEEE Int. Conf. on Robotics and Automation (ICRA). IEEE, 2010, pp. 3960–3965.
[25] S. Koehler et al., “Stationary detection of the pedestrian intention at intersections,” IEEE Intelligent Transportation Systems Magazine, vol. 5, no. 4, pp. 87–99, 2013.
[26] C. G. Keller and D. M. Gavrila, “Will the pedestrian cross? A study on pedestrian path prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 15, no. 2, pp. 494–506, 2014.
[27] R. Quintero et al., “Pedestrian path prediction using body language traits,” in Intelligent Vehicles Symp. Proc., 2014 IEEE. IEEE, 2014, pp. 317–323.
[28] T. Bandyopadhyay et al., “Intention-aware motion planning,” in Algorithmic foundations of robotics X. Springer, 2013, pp. 475–491.
[29] Z. Wang et al., “Probabilistic movement modeling for intention inference in human–robot interaction,” Int. Journal of Robotics Research, vol. 32, no. 7, pp. 841–858, 2013.
[30] H. Bai et al., “Intention-aware online POMDP planning for autonomous driving in a crowd,” in 2015 IEEE Int. Conf. on Robotics and Automation (ICRA). IEEE, 2015, pp. 454–460.
[31] J. Van Den Berg et al., “Motion planning under uncertainty using differential dynamic programming in belief space,” in Robotics Research. Springer, 2017, pp. 473–490.
[32] Z. C. Lipton, “The mythos of model interpretability,” 2016, http://arxiv.org/abs/1606.03490.
[33] Y. Liu and U. Ozguner, “Human driver model and driver decision making for intersection driving,” in 2007 IEEE Intelligent Vehicles Symp. (IV), June 2007, pp. 642–647.
[34] M. Maurer and E. D. Dickmanns, “A system architecture for au- tonomous visual road vehicle guidance,” in Proc. of Conf. on Intelligent Transportation Systems, November 1997, pp. 578–583.
[35] M. Lahijanian et al., “Automatic deployment of autonomous cars in a robotic urban-like environment (RULE),” in 2009 IEEE Int. Conf. on Robotics and Automation (ICRA), May 2009, pp. 2055–2060.
[36] M. Ardelt et al., “Highly automated driving on freeways in real traffic using a probabilistic framework,” IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 4, pp. 1576–1585, December 2012.
[37] J. P´erez Rastelli and M. Santos Peas, “Fuzzy logic steering control of autonomous vehicles inside roundabouts,” Applied Software Computer, vol. 35, no. 1, pp. 662–669, Jul 2015.
[38] W. Liu et al., “Situation-aware decision making for autonomous driving on urban road using online POMDP,” in Intelligent Vehicles Symp. (IV), 2015, pp. 1126–1133.
[39] K. H. Wray et al., “Online decision-making for scalable autonomous systems,” in 26th International Joint Conference of Artificial Intelligence (IJCAI), 2017, pp. 4768–4774.
[40] E. Galceran et al., “Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment,” Autonomous Robots, vol. 41, no. 6, pp. 1367–1382, Aug 2017.
[41] C. Hubmann et al., “Automated driving in uncertain environments: Planning with interaction and uncertain maneuver prediction,” IEEE Transactions on Intelligent Vehicles, vol. 3, no. 1, pp. 5–17, 2018.
[42] S. Brechtel et al., “Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs,” 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), pp. 392–399, Oct 2014.
[43] K. ¨Ofj¨all and M. Felsberg, “Biologically inspired online learning of visual autonomous driving,” in British Machine Vision Conf. 2014, Nottingham, UK September 1-5 2014. BMVA Press, 2014, pp. 137– 156.
[44] K. ¨Ofj¨all et al., “Visual autonomous road following by symbiotic online learning,” in 2016 IEEE Intelligent Vehicles Symp. (IV), June 2016, pp. 136–143.
[45] J. Koutn´ık et al., “Evolving large-scale neural networks for visionbased reinforcement learning,” in 15th Annual Conf. on Genetic and Evolutionary Computation (GECCO). ACM, 2013, pp. 1061–1068.
[46] S. Shalev-Shwartz et al., “Safe, multi-agent, reinforcement learning for autonomous driving,” 2016, http://arxiv.org/abs/1610.03295.
[47] D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” in Advances in neural information processing systems, 1989, pp. 305–313.
[48] U. Muller et al., “Off-road obstacle avoidance through end-to-end learning,” in Advances in neural information processing systems, 2006, pp. 739–746.
[49] Y. Chen et al., “Learning on-road visual control for self-driving vehicles with auxiliary tasks,” 2018, http://arxiv.org/abs/1812.07760.
[50] B. D. Ziebart et al., “Maximum entropy inverse reinforcement learning.” in AAAI, vol. 8. Chicago, IL, USA, 2008, pp. 1433–1438.
[51] S. Ross et al., “A reduction of imitation learning and structured prediction to no-regret online learning,” in 14th Int. Conf. on artificial intelligence and statistics, 2011, pp. 627–635.
[52] L. Sun et al., “A fast integrated planning and control framework for autonomous driving via imitation learning,” in ASME 2018 Dynamic Systems and Control Conf. American Society of Mechanical Engineers, 2018.
[53] F. Codevilla et al., “End-to-end driving via conditional imitation learning,” in 2018 IEEE Int. Conf. on Robotics and Automation (ICRA). IEEE, 2018, pp. 1–9.
[54] P. Abbeel et al., “Apprenticeship learning for motion planning with application to parking lot navigation,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS). IEEE, 2008, pp. 1083–1090.
[55] D. Silver et al., “Learning from demonstration for autonomous navigation in complex unstructured terrain,” Int. Journal of Robotics Research, vol. 29, no. 12, pp. 1565–1592, 2010.
[56] J. Michels et al., “High speed obstacle avoidance using monocular vision and reinforcement learning,” in 22nd Int. Conf. on Machine Learning. ACM, 2005, pp. 593–600.
[57] L. Yu et al., “Autonomous overtaking decision making of driverless bus based on deep Q-learning method,” in 2017 IEEE Int. Conf. on Robotics and Biomimetics (ROBIO). IEEE, 2017, pp. 2267–2272.
[58] T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” 2015, http://arxiv.org/abs/1509.02971.
[59] X. Xiong et al., “Combining deep reinforcement learning and safety based control for autonomous driving,” 2016, http://arxiv.org/abs/1612.00147.
[60] J. Kirkpatrick et al., “Overcoming catastrophic forgetting in neural networks,” National Academy of Sciences, pp. 3521–3526, 2017.
[61] Z. C. Lipton et al., “Combating reinforcement learning’s Sisyphean curse with intrinsic fear,” 2016, http://arxiv.org/abs/1611.01211.
[62] M. L. Littman et al., “Environment-independent task specifications via GLTL,” 2017, http://arxiv.org/abs/1704.04341.
[63] D. Sadigh et al., “A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications,” in Decision and Control (CDC), 2014 IEEE 53rd Annual Conf. on. IEEE, 2014, pp. 1091–1096.
[64] S. Shalev-Shwartz et al., “Safe, multi-agent, reinforcement learning for autonomous driving,” 2016, http://arxiv.org/abs/1610.03295.
[65] R. Liaw et al., “Composing meta-policies for autonomous driving using hierarchical deep reinforcement learning,” 2017, http://arxiv.org/abs/1711.01503.
[66] G. Kahn et al., “Uncertainty-aware reinforcement learning for collision avoidance,” 2017, http://arxiv.org/abs/1702.01182.
[67] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[68] F. Gritschneder et al., “Adaptive learning based on guided exploration for decision making at roundabouts,” in 2016 IEEE Intelligent Vehicles Symp. (IV), June 2016, pp. 433–440.
[69] D. S. Gonz´alez et al., “Interaction-aware driver maneuver inference in highways using realistic driver models,” in 2017 IEEE 20th Int. Conf. on Intelligent Transportation Systems (ITSC), October 2017, pp. 1–8.
[70] R. Grunitzki et al., “Individual versus difference rewards on reinforcement learning for route choice,” in 2014 Brazilian Conf. on Intelligent Systems, October 2014, pp. 253–258.