One of the longstanding issues in Evolutionary Robotics (ER) 1 research [26] is the assessment of phenotype and reward of a suitable ftness. Tere are two generally accepted methods, either (i) simulate the robot together with its environment, or (ii) physically test the robot in the real world. Simulation (e.g. [19]) is a popular choice as it is parallelisable, and, depending on model complexity, may run many times faster than real-time. Simulation sufers from the ’reality gap’ [22], whereby the necessarily-abstracted physical laws present in the simulation inaccurately represent real-world conditions, resulting in performance degredation when the former is transferred to the later. Early eforts to combat this efect focused on the application of suitable levels of noise [22]; recent research includes selecting controllers that are transferrable,(e.g., simulated performance is close to real performance) [23], and coevolutionary methods that use real measurements to inform the simulator [3], and as such can be seen as a hybrid of the two approaches.
Conversely, physical testing (e.g., [25]) guarantees that the results of the evolution work in reality, capturing dynamics and physical efects that may be missing from a simulator. In this scenario, optimisation times are long, as evaluations are inherently limited to real-time, and repeatable test environments need to be engineered to ensure fair test conditions. Additionally, working with real robots raises a number of practical issues, as highlighted by early work [16] that approximated a diferential-drive robot using a gantry-mounted camera, which was simpler to reset and maintain. In general, the choice of simulation vs. reality can be framed as a trade-of between evaluation speed (how quickly we can evolve a controller) and performance (how well it works in the real world).
Te issues of performing ER with real robots are exacerbated when fying robots are considered, as testing stochastically-generated controllers on real fying robots can be destructive. Recently, the evolution of controllers directly onto real fying robots (specifcally the popular and versatile hexacopter Unmanned Aerial Vehicle (UAV)) has been made possible, through a platform that uses a combination of physical tethers and the real-time monitoring and recovery from dangerous states [20]. Te platform safely, repeatedly, and non-destructively evolves controllers for UAVs directly on the robot (i.e., without modelling), which provides the benefts that (i) the controllers are guaranteed to work on the UAV in reality, and (ii) efects of hardware state of the UAV (e.g., propeller wear, payload confgurations) on the fight dynamics are implicitly captured. We can describe this platform as having high performance but low
speed, and increasing the speed of evolution in this platform is the focus of this research.
As we cannot signifcantly increase the speed of an individual evaluation without modelling (which we preclude as provision of a UAV model of sufcient fdelity to accurately capture the physical reality of every conceivable payload and hardware state is unrealistic), we instead consider reducing the number of evaluations required. Self-Adaptive (SA) mutation (e.g., [8]) is a promising approach that has previously been used to reduce the number of generations required to generate high-ftness solutions in simulated ER experiments [21], and has shown promise in hardware ER [15]. SA learning rates (e.g., mutation, crossover) can adapt to the instantaneous requirements of the problem considered in a context-sensitive manner, not only at the start of the experiment but throughout the evolutionary process. SA is particularly suited to our problem, as the platform will optimise myriad diferent UAVs and payloads, and as such is likely prefer diferent learning rates from experiment to experiment.
An issue with SA mutation is ftness stagnation, enacted through a combination of suboptimal learning rates and locally-optimal controllers, which cannot improve as their rates are suboptimal. In the context of ER, this is especially problematic as any experimental time wasted is real-time. Rate restarts are shown to be an efective technique to dissuade such behaviour [25]. Te question is, when population-based EA’s are considered, do we restart the mutation rates based on population ftness stagnation, or rather based on individual ftness stagnation?
In this paper we present the results of an experiment that seeks to answer this question. We test an individual-level restart strategy and a population-restart strategy, comparing to benchmarks of static rates, and self-adaptive rates with no restarts. Te performance of each strategy is assessed on a task where a real hexacopter is optimised for hovering behaviour in presence of a signifcant wind disturbance.
In this section we provide research in two relevant areas; Evolutionary Robotics, and Self-Adaptation.
2.1 ER with Flying Robots
Due to the potentially destructive nature of stochastically optimising controllers for fying robots, simulation is popular [4, 6, 19, 27, 28]. Simulation also allows evaluation to occur faster than real-time, however the faster the simulation is, the more abstracted the underlying model of reality tends to be. Tis results in controllers transferred from simulation to reality being unable to cross the ’reality gap’, e.g., [29]. Tis is evident even in recent work which evolved behaviour trees to allow a micro UAV to escape from a room by fying through a window [30], resulting in a simulated escape rate of 88%, which was reduced to 46% in reality and could only be increased to 54% through manual rule tweaking.
Atempts to directly evolve control for real fying robots are limited. A blimp controller is successfully evolved [12], but the slow dynamics of the blimp simplifes recovery from dangerous states. Control of a miniature helicopter [14] is evolved, although only height and yaw control are optimized.
Coevolutionary methods are applied to force quadrotor models (represented using Genetic Programming trees) to match real recorded fight data in a system-identifcation approach [18], however the experimentation is focused on modelling rather than controller optimization.
Controllers are evolved on real hexacopters using a Bee Colony Algorithm [13], which is demonstrated to work as both an on-line and ofine optimiser, with only small performance diferences between the two modes. However, state estimation requires an expensive infrared tracking system, and frequent human intervention is required to e.g., change bateries.
Recently, a platform is demonstrated that allows for safe and repeatable 24/7 controller optimisation of any multi-rotor (with certain size limitations) [20]. As controller are directly evolved, controllers are guaranteed to work on the real robot, accounting for any atached payload and hardware variability. However, evaluation is limited to real-time. To improve the efciency of the platform, self-adaptation is proposed as a method of reducing the number of evaluations required.
2.2 Self-Adaptation
Self-adaptation (see, e.g., [8] for an overview) allows key evolutionary parameters to vary throughout the optimization process, allowing suitable rates to be found for an instantaneous evolutionary state. Due to their real-time limitation, hardware ER experiments typically have a low feasible number of ftness evaluations that can be executed [9]. SA has been used to reduce the time spent evaluating the population by optimisating the controller evaluation times explicitly, in simulation [7]. Further research [9] identifes three common parameters that can be varied; population size, mutation rate, and the controller evaluation period, together with a re-evaluation rate which is less commonly but necessary in online scenarios to achieve more reliable ftness estimates. Te authors conclude that mutation rate has the most signifcant efect in reducing evaluation times. It is therefore mutation rates that we focus on in this study.
Fitness stagnation is a common problem when using SA, as rates may be set that prevent successful location of the global optimal solution. We note the efectiveness of rate restarts [25] in countering the efects of ftness stagnation. Performance-based restarting of unfavourable rates is shown to (i) dissuade premature convergence into unfavourable areas of the rate space, and (ii) ‘rescue’ the optimisation process from unfavourable rate setings. As [25] uses a simple 1+1 Evolution Strategy [10], the authors do not consider the diferent efects that may be observed if the rate restart is applied on the level of the individual, vs. the level of the population. As ftness stagnation is still an issue with population-based SA, this question is particularly relevant for our application.
For our purposes, an individual-level restart involves the mutation rates of that individual being reset if its ftness doesn’t improve forn consecutive generations. A population-level restart will restart every individual’s mutation rates if none of the individuals can generate a ftness improvement for n consecutive generations.
Conceptually, it is not obvious which would be preferrable — individual restarts may be too unstable when combined with the self-adaptation of the rates themselves, but the restarts are triggered
Figure 1: Te platform, showing (1) the fan, (2) camera, (3) hexacopter, (4) physical tether, (5) data/power tether, and (6) light. Te camera height is 200cm and padded foor area is 271cm2. Te light grey foor area depicts a standard fight area of , and 20cm in z.
immediately on an individual. Conversely, population-level restarts will present a more stable evolutionary process, but the focus on improving global ftness means that individuals who are globally suboptimal, but with good rate setings, may be adversely afected.
We are motivated to investigate the efects of these two restart strategies on the performance of an ER experiment, and intend to produce results that will inform the use of rate restarts by other researchers.
Experimentation occurs on our optimisation platform. We refer the interested reader to our previous research [20] for a full algorithmic description, as well as a similar platform for multi-legged robots [17]. Briefy, the platform comprises a solid foor which is covered with foam mating. Te hexacopter is anchored to the foor with nylon wires, so that fipping (tilt angles > 60), and excessive rotation (
) are physically prohibited. An LED strip light and camera are mounted atop a mesh-covered metal frame, which stands over the foor. An oscillating fan provides wind disturbances of
5m/s, with an oscillation period of 10 seconds and total traversal angle of 120
. A 24V cable provides constant power, and a serial cable connects to the host PC, which manages and monitors experiments using the real-time Extended State Machine (ESM) framework [24]. See Fig.1.
Te platform evolves a population of hexacopter controllers. We use Proportional-Integral-Derivative (PID) controllers[1] as they are a de facto representation, and compatible with most commerciallyavailable fight controllers, which increases the generality of the platform. PIDs have previously been shown to be amenable to evolutionary optimisation — see[11] for a survey.
A two-loop PID structure controls the hexacopter’s position and atitude; see Fig.2. Horizontal position (and
) is controlled by the outer loop, and atitude (
) and height h by the inner loop. Te outer-loop PIDs generate setpoints
, and
represent commanded changes in atitude and thrust, which are scaled in the range of atainable motor PWM signals
=2000, and passed to a linear mixer which produces one controller command per motor
PID control minimises the error e between the hexacopter’s estimated position and atitude, and the current waypoint, following (1). Tere are 6 PIDS in all, as the waypoint is represented by a 6-tuple of setpoints for atitude (), and position (
,
). Each variable is limited to a maximum error
for atitude, 15cm for
) before being input to the PID.
Here, o is the PID output, t is the instantaneous time, is the integration timestep from 0 to
are controller parameters that defne the response of the controller to raw error, integral error, and the derivative of the error respectively. With three gains per PID and six PIDs, a controller is represented by 18 reals.
Controllers are evolved using a self-adaptive Diferential Evolution (DE). Specifcally, we use DE/rand/1/bin as it has shown promising results in evolving PID gains [2, 5]. Per generation, a donor vector v is created for each ‘parent’ individual p as in (2), where F, (0 < is the diferential weight, and r1, r2, and r3 are unique individuals that are selected uniform-randomly.
A ‘child’ vector c is created by probabilistically replacing elements of p with elemnts of v. For each vector index if i == R or rand < CR, otherwise
is a uniform-random number between 0 and 1,
, is the crossover rate, and R is a random vector index, ensuring
. Te children are evaluated and assigned a ftness f , with c replacing its parent p if
is superior to
. When every child has been evaluated, the next generation begins.
Self-adaptation is based on an Evolution Strategy, following e.g., [10], to allow more straightforward comparisons to previous work with evolution strategy operators[]montanier:inria-00566898. New population members random-uniformly initialise their CR and F, respecting bounds. Child individuals copy their parent’s CR and F, and modifes them following (3), respecting bounds. Te comparative static baseline rates are CR=0.5, and F=0.8, following a brief parameter sweep [20].
5.1 Restart Strategies
An individual is represented by a controller, plus its ftness f , rates CR and F, and a restart counter r, which is initially 0. For individual-level restarts, r is incremented for a parent when it’s child does not
Figure 2: PID control structure, showing attitude and position loops. Parameters denote error limits for height yaw and attitude respectively.
are minimum and maximum motor commands, and
are command inputs to a mixer which outputs speed controller commands
. Te disturbance is input from the fan.
replace it. For population-level restarts, a global r is incremented for each consecutive generation the best population ftness is not improved. Global restarts emphasise global ftness improvements by instantaneously restarting all rates at the same time, whereas individual-level restarts encourage each classifer to improve itself without consideration of global population performance.
A restart is triggered when r==5 2. Individual-level restarts afect only the individual’s rates, whereas population-level and periodic restarts simultaneously afect every member of the population. Restarts reinitialise CR and F uniform-randomly within their respective ranges, and also resets r to 0.
Note that this is a ftness-based restart, as opposed to a rate magnitude-based restart [25], where the mutation step-size alone triggers a restart. As we use two rates, we consider the overall efect of both rates, which can be neatly captured through the ability of an individual (population) to consistently improve it’s ftness over consecutive generations.
Performance is evaluated on a wind-afected hover scenario, with a total evaluation length of 40s. A hexacopter atempts to follow a series of fve waypoints; the target waypoint changes deterministically every 8s. Te waypoints are designed to sufciently excite all of the hexacopter’s six degrees of freedom, see Fig.4.
6.1 Initialisation
At the start of an experiment, controllers are randomly generated, briefy evaluated, and added to the population if they allow the hexacopter to stay in the air for > 0.2s. When the population
Figure 3: Te trajectory fown by the hexacopter. Waypoints change every 8s and are: (1) hover at a height of 10cm with a yaw of 40, (2) move 8cm North and 8cm West with a yaw of 0
, (3) increase height to 14cm and move 16cm South and 16cm East, (4) return to the centre of the cage with a yaw of 80
, (5) alter yaw to 40
. Te fan can be seen in the top-lef of the image.
size reaches N = 20, the frst generation begins. Initial control parameter ranges are calculated using (4), where is a generalised maximum possible command (PWM) for each of the control parameters
6.2 State Estimation
Te hexacopter’s state is estimated at 400Hz. Te hexacopters full state vector comprises: atitude Euler angles (roll , pitch
, yaw
, at 400Hz), plus angular rates (
, at 400Hz), and height h, at 20Hz, together with position for North
and East
, and velocities
(all at 60Hz). Range limits are provided in Appendix A.
An Inertial measurement unit calculates Euler angles and height (together with a frame-mounted ultrasonic rangefnder). Atitude angles are processed through a Kalman Filter, and height through a complimentary flter. Position is measured through a machine vision camera. Angular rates are derived from two consecutive Euler angles, and velocities calculated through a linear regression of fve consecutive position estimates. Tis provides a 3D position error <5mm and heading error . Position and atitude are used by the controller. Te full state estimate is used to assign ftness and perform health monitoring.
6.3 Fitness Assignment
During an evaluation, ftness accumulates at 400Hz by adding a per-Hz ftness measure (max. 10) to a running total f (max. 160,000). Te composition of
is depicted in Appendix B. In brief, a high f corresponds to the hexacopter’s state closely matching the position and atitude setpoints of the current waypoint.
To account for noisy ftness assessments brought about by imperfect sensors, any controller that completes the full 40s evaluation is immediately reevaluated and assigned the mean ftness. If the controller completes both evaluations, it is said to be a success. Successful controllers are reset to their start positions (centre of the foor area with ) to ensure a fair test between controllers; before this point we are more interested in discovering controllers that can fy rather than accurately comparing controller performance within the population.
6.4 Health Monitoring
ESM monitors dangerous hexacopter states, and safely terminates an evaluation if any of the following are detected: 50cm/s,
25cm/s,
15
, maximum yaw error of 45
exceeded, maximum current draw of 15A exceeded, or maximum rate for upper PWM limit of 75 1
exceeded. As well as danger- ous states, termination also occurs if the hexacopter doesn’t move during the frst 5s of an evaluation (a time-saving measure), or if the hexacopter lands during an evaluation (touches the ground for >1s) having previously been fying. If a fight is terminated, the controller is assigned its current accumulated ftness.
In its current confguration, the platform executes ten experimental repeats for each of the four restart strategies. Each repeat optimises 20 controllers over a number of generations until convergence. Each generation involves the creation of 20 new individuals, which are evaluated on the test problem, and potentially replace current population members. An evaluation involves an individual’s control parameters being used by the hexacopter, and culminates with a ftness value being assigned to the individual. Te experiment ends when each controller in the population can fy for the full 40s evaluation period (convergence). For brevity, we refer to the diferent strategies as STATIC (static mutation rates), ADAPT (self-adaptive, no restarts), INDIV (individual-level restarts) and GLOBAL (globallevel restarts). Te Mann-Whitney U-test is used to statistically compare the strategies.
Convergence. Table 1 and Fig4(a) reveal that all three of the self-adaptive strategies converge more rapidly than STATIC (all p<0.05), showing the benefts of self-adaptation over STATIC (although we note that STATIC is a baseline only). GLOBAL displays the best mean convergence generation (24.4), which is signifcantly beter than STATIC and ADAPT (p<0.05), and similar to INDIV (27.6). Compared to GLOBAL and INDIV, ADAPT displays two outlier experiments (with convergence generations 171 and 173), resulting in the statistically signifcant diferences between them (p<0.05). We conclude that self-adaptation is benefcial to the evolutionary process, but restarts are required to prevent unsuitable rate setings.
Fitness. Fitness trends can be seen in Fig. 4(b)-(d)Te mean highest ftness for GLOBAL (f =124036.7), ADAPT (f =124151.1), and STATIC (f =124020.7) are statistically similar. INDIV (f =127185.2, p<0.05) has statistically beter high ftness than all three, indicating that the it is benefcial through the ability to address the individual rate requirements of the controllers, which may be at diferent places in the evolutionary process. INDIV also displays the best mean ftness (f =122562.6, p<0.05 compared to STATIC and ADAPT), due to the ability the instantaneous rates. GLOBAL has a high standard deviation, and so is statistically similar to all other strategies. Tis is likely because restarts are driven by global performance only, so individuals may be stuck with suboptimal control in suboptimal rate regions for as long as there is a single population member that is improving the global ftness. INDIV has the highest low ftness (f =117480.6, p<0.05 compared to GLOBAL and STATIC), adding further support to the hypothesis that the extra context-sensitivity induced by individually monitoring the controllers for ftness stagnation overcomes the increased disruption to the evolutionary process.
We note that having suboptimal controllers stuck in suboptimal regions through poor rate seting could potentially improve global performance, if the suboptimal control vectors provide useful genetic code to the global ftness leader. Tis depends on the seting of the suboptimal controller vectors and how they interact with the crossover operator, and will be the subject of future research.
We note that both restart strategies have approximately double the mean standard deviation (INDIV=2716, GLOBAL=3376) of ADAPT (1528) and STATIC (1429), showing some of the disruption caused by restarts. Tis patern of high standard deviation for restart strategies is replicated for mean ftness, and low ftness, indiciating that it is a general property of restart strategies. Disruption is thought be be caused by rates (i) jumping around in the rate space during an experiment, (ii) self-adapting to more promising
Table 1: Comparing common performance metrics between the four restart strategies. Standard deviations are shown in parenthesis. Symbols indicate the strategy is statistically (p<0.05) better (higher, for * = ADAPT, as measured by a Mann-Whitney U-test at p<0.05.
Figure 4: (a) A comparative boxplot showing convergence generations, highlighting the outliers for SA (171 and 173). Outliers are shown with diamonds.(b) High, (c) mean, and (d) low ftness averages for the four strategies over the frst 100 generations. Lines are plotted until all repeats for a given strategy have converged.
areas of the rate space, but not before the restart counter is triggered and the rate is reset into an entirely new area, thus disrupting the self-adaptation, and (iii) restarting to a more suboptimal area (efectively wasting the restart). In the experiment presented here, disruption was evidenced in large standard deviations, rather than direct reductions in ftness and convergence. As only the ftest fnal controllers would be used to fy the hexacopter, we conclude that rate restarts are a viable strategy to control evolutionary rate divergence in our ER scenario.
Rates. Te crossover rate CR displays no signifcant diferences between the three self-adaptive strategies (INDIV=0.413, GLOBAL=0.536, ADAPT=0.482, Table 1, Fig. 5(a)). Te introduction of restarts signifcantly increases the mean value of the diferential weight F (INDIV=0.756, GLOBAL=0.758) compared to ADAPT (0.289, both p<0.05). Practically, restarting F causes reinitialization in the range [0,2] with a mean new value of 1.0 (from[31]), which is subsequently self-adapted down towards the fnal values. As ADAPT has no mechanism to quickly alter rates, it converges gradually to its fnal value, with a corresponding decrease in impact from the donor vector. When this value becomes too low (Fig. 5(b)), the algorithm struggles to move itself out of local optima. If these optima do not result in successful controllers, the convergence generation becomes large. In contrast, the use of restarts in the INDIV and GLOBAL can be seen to increase F (for INDIV this change is notable afer generation 10, for GLOBAL a more gradual increase is observed afer generation 13). Te change is more gradual as (i) all rates are reset (meaning resets would setle around the mean value on reinitialisation), and (ii) fewer resets are used as the global ftness stagnation is the trigger.
Restarts occur periodically throughout both INDIV and GLOBAL. GLOBAL restarts occur mainly between generations 10-30 (Fig. 5), with a mean of 3.7 restarts triggered per experimental repeat, with each restart afecting all 20 individuals for 74 total restarts.
As INDIV restarts based on individual ftness progression, we note restarts being more uniformly spread across the generations. As is typical with self-adaptive approaches, the rates themselves vary from generation to generation based on how easily the rates locate successful children. INDIV uses a mean of 53 restarts per repeat. Tis diference is not signifcant, as the rarity of restart triggering for GLOBAL is ofset by each restart afecting the entire population. Te diference in efects of the strategies on parameter evolution is most clearly seen from generations 10-20 in Fig. 5(b).
In this study, we compared two diferent implementations for restarting key evolutionary rates during self-adaptive ER experiments, paying particular interest to the level at which restarts are implemented (i.e., individual or population), and compared to two benchmarks, (i) a constant-rate strategy, and a no-restart strategy. Results indicate that restarts are useful for SA ER experiments, mainly to dissuade premature convergence at local optima caused by poorly-set rates. Tese results are in agreement with previous studies using self-adaptive hill-climbing algorithms for ER [25], but here the experimentation is expanded to cover population-based algorithms, and consider the two main evolutionary operators, crossover and mutation.
When tested on a hover experiment that optimises PID controllers on a real hexacopter, we note that both INDIV and GLOBAL prevent the extreme outlier convergence generations noted in ADAPT. Restarts are seen to generate more variance in rate setings. Disruption is evidenced in the standard deviations in ftness metrics, but not in degredation of atainable ftness values or convergence generations. Between the two restart strategies, INDIV emerges as our clear preference as it is able to atain higher ftness controllers than all other strategies considered.
Future research will consider the efect of problem difculty on the restart strategy performance. Hover is a relatively simple behaviour, and we envisage restart strategies to have much more of an impact on more challenging tasks, which are more likely to have multimodal ftness landscapes with multiple local optima. We
Figure 5: Showing mean CR and F rates for all self-adaptive experiments throughout the experiments. Shaded regions denote standard error.
also wish to experiment with diferent fying robots, and payloads, and observe the efects of the two on the evolutionary process.
: max. pitch and roll rate (115
: pitch and roll rate noise threshold (30
: horizontal velocity noise threshold (5cm/s)
: max. horizontal velocity (15cm/s)
: vertical velocity noise threshold (2cm/s)
: max. vertical velocity in closed-loop system (20cm/s)
: atitude range limit (15
: height range limit (10cm)
: core height limit (5cm)
: core yaw limit (15
: yaw range limit (160
: core position limit (8cm)
: position range limit (20cm)
fcycle
db{x,l} =
[1] Karl Johan ˚Astr¨om and Tore H¨agglund. 2006. Advanced PID control. ISA-Te Instrumentation, Systems, and Automation Society; Research Triangle Park, NC 27709.
[2] Arijit Biswas, Swagatam Das, Ajith Abraham, and Sambarta Dasgupta. 2009. Design of fractional-order PI controllers with an improved diferential evolution. Engineering applications of artifcial intelligence 22, 2 (2009), 343–350.
[3] Josh Bongard, Victor Zykov, and Hod Lipson. 2006. Resilient machines through continuous self-modeling. Science 314, 5802 (2006), 1118–1121.
[4] S. Bouabdallah, P. Murrieri, and R. Siegwart. 2004. Design and control of an indoor micro quadrotor. In 2004 IEEE International Conference on, Vol. 5. 4393–4398.
[5] I. Chiha, J. Ghabi, and N. Liouane. 2012. Tuning PID controller with multiobjective diferential evolution. In Communications Control and Signal Processing (ISCCSP), 2012 5th International Symposium on. 1–4.
[6] R. De Nardi, J. Togelius, O.E. Holland, and S.M. Lucas. 2006. Evolution of Neural Networks for Helicopter Control: Why Modularity Maters. In Evolutionary Computation, 2006. CEC 2006. IEEE Congress on. 1799–1806.
[7] Cristian M Dinu, Plamen Dimitrov, Berend Weel, and AE Eiben. 2013. Selfadapting ftness evaluation times for on-line evolution of simulated robots. In Proceedings of the 15th annual conference on Genetic and evolutionary computation. ACM, 191–198.
[8] A. E. Eiben, R. Hinterding, and Z. Michalewicz. 1999. Parameter control in evolutionary algorithms. IEEE Transactions on Evolutionary Computation 3, 2 (Jul 1999), 124–141.
[9] A. E. Eiben, G. Karafotias, and E. Haasdijk. 2010. Self-Adaptive Mutation in On-line, On-board Evolutionary Robotics. In Self-Adaptive and Self-Organizing Systems Workshop (SASOW), 2010 Fourth IEEE International Conference on. 147– 152.
[10] Manfred Eigen. 1973. Ingo Rechenberg Evolutionsstrategie Optimierung technischer Systeme nach Prinzipien der biologishen Evolution. mit einem Nachwort von Manfred Eigen, Friedrich Frommann Verlag, Strutgart-Bad Cannstat.
[11] Peter J Fleming and Robin C Purshouse. 2002. Evolutionary algorithms in control systems engineering: a survey. Control engineering practice 10, 11 (2002), 1223– 1241.
[12] Dario Floreano, Jean-Christophe Zuferey, and Claudio Matiussi. 2003. Evolving Spiking Neurons from Wheels to Wings. In Dynamic Systems Approach for Embodiment and Sociality (Advanced Knowledge International, International Series on Advanced Intelligence), Vol. 6. 65–70. K. Murase and T. Asakura (eds.).
[13] Pablo Ghiglino, Jason L Forshaw, and Vaios J Lappas. 2015. Online Evolutionary Swarm Algorithm for Self-Tuning Unmanned Flight Control Laws. Journal of Guidance, Control, and Dynamics 38, 4 (2015), 772–782.
[14] M.A. Gongora, B.N. Passow, and A.A. Hopgood. 2009. Robustness analysis of evolutionary controller tuning using real systems. In Evolutionary Computation,
[15] E. Haasdijk, A. E. Eiben, and G. Karafotias. 2010. On-line evolution of robot controllers by an encapsulated evolution strategy. In IEEE Congress on Evolutionary Computation. 1–7.
[16] Inman Harvey, Philip Husbands, and David Clif. 1994. Seeing the light: Artifcial evolution, real vision. School of Cognitive and Computing Sciences, University of Sussex Falmer.
[17] Huub Heijnen, David Howard, and Navinda Kotege. 2017. A Testbed that Evolves Hexapod Controllers in Hardware. In Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, In Press.
[18] Owen E Holland and Renzo De Nardi. 2008. Coevolutionary Modelling of a Miniature Rotorcraf.. In Intelligent Autonomous Systems 10 (IAS10). IOS Press.
[19] David Howard and Alberto Elfes. 2014. Evolving Spiking Networks for Turbulence-Tolerant Qadrotor Control. In International Conference on Artifcial Life (ALIFE14). 431–438.
[20] David Howard and Torsten Merz. 2015. A platform for the direct hardware evolution of quadcopter controllers. In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE, 4614–4619.
[21] Gerard Howard, Ella Gale, Larry Bull, Ben de Lacy Costello, and Andy Adamatzky. 2012. Evolution of plastic learning in spiking networks via memristive connections. IEEE Transactions on Evolutionary Computation 16, 5 (2012), 711–729.
[22] Nick Jakobi, Phil Husbands, and Inman Harvey. 1995. Noise and the reality gap: Te use of simulation in evolutionary robotics. In European Conference on Artifcial Life. Springer Berlin Heidelberg, 704–720.
[23] Sylvain Koos, Jean-Baptiste Mouret, and St´ephane Doncieux. 2010. Crossing the Reality Gap in Evolutionary Robotics by Promoting Transferable Controllers. In Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computa- . ACM, New York, NY, USA, 119–126.
[24] Torsten Merz, Piotr Rudol, and Mariusz Wzorek. 2006. Control system framework for autonomous robots based on extended state machines. In Autonomic and . IEEE, 14– 14.
[25] Jean-Marc Montanier and Nicolas Bredeche. 2011. Embedded Evolutionary Robotics: Te (1+1)-Restart-Online Adaptation Algorithm. In New Horizons in Evolutionary Robotics, Springer Series: Studies in Computational Intelligence (Ed.). Springer, 155–169.
[26] S. Nolf and D. Floreano. 2001. Evolutionary Robotics. Te Biology, Intelligence, and Technology of Self-organizing Machines. (2001).
[27] B.N. Passow, M. Gongora, S. Coupland, and A.A. Hopgood. 2008. Real-time evolution of an embedded controller for an autonomous helicopter. In Evolutionary Computation, 2008. CEC 2008. (IEEE World Congress on Computational Intelligence). IEEE Congress on. 2538–2545.
[28] Mario G Perhinschi. 1997. A modifed genetic algorithm for the design of autonomous helicopter control system. In Proceedings of the AIAA Guidance, Navigation and Control Conference. 1111–1120.
[29] Chad Phillips, Charles L Karr, and Greg Walker. 1996. Helicopter fight control with fuzzy logic and genetic algorithms. Engineering Applications of Artifcial Intelligence 9, 2 (1996), 175–184.
[30] K. Y. W. Scheper, S. Tijmons, C. C. de Visser, and G. C. H. E. de Croon. 2016. Behavior Trees for Evolutionary Robotics. Artifcial Life 22, 1 (Feb 2016), 23–48.
[31] Rainer Storn and Kenneth Price. 1997. Diferential Evolution &Ndash; A Simple and Efcient Heuristic for Global Optimization over Continuous Spaces. J. of Global Optimization 11, 4 (Dec. 1997), 341–359.