between humans also involves physical human interaction, a technical term for a physical communication between two or more individuals in a shared context. The emergence of humanoid robots by the mid-nineties brought a new “individual” to interact within this shared context, thus extending the human-to-human interaction theory to human-robot interaction theory, i.e. Human-Robot Interaction (HRI).
Efforts to HRI are being incrementally devoted over the last years[1], addressing new application domains in which new generations of robots begin to coexist and physically interact with humans (e.g., rehabilitation therapy [2], social interaction [3], education [4]) in contrast to the traditional well-structured industrial robotic scenarios lacking HRI. Physical HRI implies robots operating in complex unstructured environments in which human actions cannot be modelled; thus, demanding robot behaviour to be autonomous, reactive under unpredicted actions, adaptive and safe (i.e. human-like behaviour) [5]. The achievement of such compliant behaviour can be addressed considering different design aspects of robotic hardware (rigid vs. flexible materials, elastic actuators, low power actuators, etc.) and software (position vs. torque control, adaptive control systems, etc.).
Regarding hardware design, robots can be equipped with passive intrinsic compliance by means of different elastic components, muscle like actuators and/or soft materials. This approach, taking biology as an inspiration, offers a compliant alternative to classical rigid-bodied robots. Yet, traditional position control methods are not of direct application in the presence of elastic materials whose mathematical modelling is almost intractable, thus demanding new control strategies [6, 7]. These traditional methods offer excellent accuracy for industrial rigid-bodied robots in well-structured environments (e.g. automated car factories) where HRI is explicitly avoided since neither safety nor compliance can be guaranteed. Compliance demands torque control, and torque control strategies based on dynamics modelling cannot be efficiently applied since the nonlinearities of elastic components make detailed modelling extremely complex [8]. Finding a solution for controlling biologically inspired robots carrying elastic components and low power actuators shall directly benefit from a better understanding of biological motor control itself.
The control mechanisms encountered in biology are involved in a continuous learning process to cope with the complexity and changes in the body structure and dynamics. Artificial Intelligence (AI) can be used to replicate this learning process; in particular, widely used Artificial Neural Networks (ANNs) have been proposed and tested as a solution
Ignacio Abadía, Francisco Naveros, Jesús A. Garrido, Eduardo Ros, Niceto R. Luque
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
for the control of these compliant robots without requiring prior knowledge of the robot dynamics [8, 9]. ANNs are vaguely inspired in the functioning of their biological neural network counterparts. They consist of interconnected computational units, called artificial neurons, whose entry information travels from one computational unit to another across the ANN. The entry information is processed, at a neuron level, via some non-linear function of the sum of neuron inputs and then it is transmitted through the neuron connections, i.e. typically represented by a real number. Neuron connections are adjusted as learning proceeds. ANNs are designed to address problems by considering well-structured data typically using standard analogue representations for neural activity. They lack the ability to serve as the linkage between biological neural coding and movement coordination, thus side-lining any attempt at drawing biological analogies. Spiking Neural Networks (SNNs), also called the third generation of neural networks, constitute a more biologically plausible approach of neural networks as they model the information transfer and processing as occurs in biological neurons, i.e. via the precise timing of spikes (discrete events at points in time) [10]. Torque control deals with the robot inner dynamics, that is, the evolution through time of a physical system. This makes SNNs use of temporal coding adequate for capturing the temporal evolution of analogue sensorimotor signals [11], a pivotal feature in motor control and movement coordination [12]. SNNs intrinsic characteristics make them a suitable solution for adaptive robot control.
Several areas of the Central Nervous System (CNS) contribute to the temporal coordination implied in motor control such as the premotor cortex, the parietal cortex, the primary motor cortex, and the cerebellum [13], which stands out by its role in the integration, regulation, coordination of motor processes and more importantly, motor learning [14- 17]. The cerebellum can be regarded as a separate area of the brain to which it is attached underneath the cerebral hemispheres, whose neural structure is highly regular in striking contrast to the cerebral cortex neural structure. This well-known structure makes it a suitable reference for the development of biologically plausible SNNs.
The depicted scenario yields several elements: 1) the cerebellum; a highly regular neural structure, thus, easy to computationally replicate to some extent, which is responsible for motor learning and coordination, 2) an artificial SNN incorporating a continuous learning process at its core that is able to mimic biological neurons and neural processing, and 3) hardware compliant robots lacking compliant control strategies. Here, we conjugate these three elements taking a holistic approach in tackling the HRI compliance problem.
Addressing this problem implies state-of-art challenges that we face along this work.
First, we need the cerebellar-like SNN to operate in RT. Spiking neural processing in RT is a highly demanding task in terms of computational cost. Considering that our computational resources are limited, there must be a trade-off between network size, neuron complexity, network topology and temporal output resolution, which determines, to certain extent, the motor control accuracy. We further developed our spiking neural simulator (EDLUT) to accommodate, for the first time, a RT cerebellar SNN consisting of ~62 K leaky integrated and fired (LIF) neurons with ~36.4 M synapses, 36 M of which are endowed with plasticity.
Second, we need to implement an effective RT dialogue between the network spike domain and sensorimotor analogue domain. In closed loop, the movements caused as a consequence of the sensory stimuli require that the SNN generating the motor commands receives an adequate driving input to generate an adequate motor output. This task is entrusted to the primary motor cortex (M1) which generates this input drive as a transformed version of the initial sensory signal [18]. Here, we emulate this M1 sensory transformation using a set of analogue-to-spike/spike-to-analogue modules compatible with Robot Operating System (ROS). These modules operate in RT without compromising motor accuracy.
Third, we need to cope with hardware/software compliance impositions. A compliant interaction with an unstructured environment [19] compels us to use a compliant robot (e.g. Baxter robot) in direct torque control. Using a compliant robot, such as Baxter, forces us to compensate, via the SNN controller, Baxter’s loss in precision and lower capacity to exert a force due to its inner hardware compliance. We provided a compliant control in which a cerebellar-like SNN is able to continuously learn the minimal torque values needed to execute certain motor tasks in RT even under changing operational and ambient conditions, i.e. perturbation forces that continuously readjust their module and direction, human collisions, and interactions.
Finally, we need to assess the degree of goodness of the implemented solution. We have provided a compliant control in which a SNN is able to learn the adequate torque values in a safe manner. Furthermore, it is remarkable that our compliant control outperforms the accuracy achieved by the default factory-installed position control.
All in all, this work is the answer to overcome the technical difficulties aforementioned whose actual outcome provides us with a novel control strategy for hardware compliant robots based on a spiking cerebellar structure, which replicates the biological learning mechanisms involved in motor control.
control and movement coordination [15, 17] to implement a novel control strategy for hardware compliant robots. It is thus appropriate to evaluate our cerebellar-like model in the field of robot dynamics control in terms of performance under a set of different conditions. To this aim, we proposed a specific way of performing the experimental evaluation through two trajectory families. 1) On the one hand, we tested our cerebellar controller in reaching movements; that is, fast, ballistic arm
where K = 1000 denotes the number of samples of the two second trajectories; and N = 6 is the number of joints. The MAE provided a numerical performance indicator for the quality of the cerebellar controller, thus allowing us to compare it against the default factory-installed position control.
B. The Compliant Robot; the Baxter Robot
The Baxter robot®, manufactured by Rethink Robotics™ [26], is a collaborative robot consisting of two arms with seven DoF. Baxter implements torque control and it is inherently compliant thanks to its series elastic actuators (SEAs) [27]. These SEAs interpose a spring between the motor/gearing elements and the final motor output. These springs are deformable under human interaction and, therefore, a built-in mechanism that inherently allows for safe, compliant physical HRI
Prior to Baxter’s hands-on testing, we used the simulated version of Baxter available in Gazebo as a safe environment to develop and test the robot-cerebellum interface [28]. This interface was developed using ROS to control both the simulated and real robot. ROS allowed sending motor commands (torque commands) to the robot and receiving sensorimotor information (joints positions and velocities) from the robot sensors [29]. The designed trajectories for our study involved the torque control of 6 DoF of one arm of the robot.
C. Cerebellar Control Loop
The Baxter robot and the cerebellar network interconnection required the establishment of a dialogue in which the exchange of sensorimotor information modified the behaviour of one another. This dialogue was framed within a closed control loop with negative feedback. See Fig. 1 for a control loop overview.
The cerebellar-like spiking model (implemented in EDLUT, see below) acted as the controller and computed a motor command at each time step (2 ms) to achieve the goal behaviour. To this aim, the controller computed the neural activity using as input information the robot state, the ideal trajectory to be performed by the robot arm, and the
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
instructive signal obtained. The robot state (actual position, Qa, and actual velocity, Q̇ a, per joint) was provided by Baxter’s sensors and then mapped into control signals. The desired trajectory signals (position, Qd, and velocity, Q̇ d, per joint) were provided by a trajectory generator module representing the motor cortex and other motor areas. The instructive ɛ signals (one per joint) were obtained by comparison of the desired trajectory and the robot state signals. Once the cerebellar network computed a motor command, it was sent to the robot inducing movement to the arm. Consequently, the cerebellar network input sensory information was modified, thus, closing the loop. Cerebellar input and output signals were updated every 2 ms (500 Hz) guaranteeing low latency, a mandatory requirement for RT control.
The cerebellar controller ran in EDLUT simulator [30-32]. EDLUT is mainly oriented to embodiment experimentation so that neural computation can be slowed down/speeded-up to cope with RT requirements imposed by a real body, e.g. humanoid robot [33, 34]. Regarding the theoretical concepts underpinning our cerebellar controller, please see our previous works [33, 35] on spike-analogue interfaces, [22, 36-38] on cerebellar learning, [37-40] on cerebellar granular layer, and [38, 41] on cerebellar control loops and neurorobotics.
D. Cerebellar Controller; the Neural Network
The cerebellar network controller consisted of five neural layers: 1) Mossy fibres (MFs), 2) granule cells (GCs), 3) climbing fibres (CFs), 4) Purkinje cells (PCs), and 5) deep cerebellar nuclei (DCN) (see Fig. 2). The cerebellar network was in turn divided into six micro-complexes[42], each one focusing on controlling a different Baxter’s joint.
The MFs constituted the input layer through which the input sensorimotor information was conveyed (actual and desired joint position and velocity trajectories translated into spiking patterns) towards the inner cerebellar network layers. These MFs projected excitatory afferents on both GCs and DCN. GCs, then, processed and re-coded this sensorimotor information in a sparse somatosensory neural activity that was later propagated by the parallel fibres (PFs) (i.e. excitatory GCs’ axons) to the PCs. These PCs, in turn, correlated this somatosensory activity coming from PFs with the neural
activity conveyed by the CFs (i.e. excitatory inferior olive, IO, axons). The CF neural activity, generated in the olivary system, represented the mismatch between the actual and desired trajectories per Baxter’s joint and acted as an instructive signal. PCs underwent synaptic plasticity, that is, a supervised mechanism that correlated both PF and CF neural activities and adapted the PFs synaptic weight distribution accordingly. The cerebellar input-output response was adjusted and, therefore, the error movement minimised [43] in subsequent executions. Finally, the DCN closed the cerebellar loop via the excitatory synapses coming from MFs and CFs together with the inhibitory synapses from PCs. The DCN neural activity of each micro-complex ultimately drove each Baxter’s joint by means of a spike-to-torque command transformation. 1) MFs (240) were modelled as input fibres able to propagate the sensorimotor information towards GCs and DCN at each simulation time step (2 ms). These 240 fibres were organised into six groups of 40 fibres each, i.e. one group per joint. Each MF group was in turn subdivided into four equal subgroups on which actual and desired joint positions and velocities were directly mapped. Only four non-overlapped MFs per group were active at each simulation time step representing the actual input neural state. 2) GCs (60,000) were modelled as LIF neurons emulating a state generator [40, 44, 45]. These 60,000 neurons were organised into six groups of 10,000 neurons each, i.e. one group per joint. Each GC received four input synapses [46] coming from each subgroup belonging to the very same MF group. The connectivity pattern between GC
Fig. 1. Schematic of the Cerebellar closed-loop control. The Mossy fibres (MFs) convey the sensory signals, whilst the climbing fibres (CFs) convey the instructive signals, thus providing the inputs to the cerebellar network. The deep cerebellar nuclei (DCN) drive the cerebellar torque output commands. MFs project sensorimotor information onto granular cells (GCs) and DCN. GCs, in turn, project onto Purkinje cells (PCs) through parallel fibres (PFs). PCs also receive excitatory inputs from the CFs. Finally, DCN receive excitatory inputs from the MFs and CFs and inhibitory inputs from the PCs. Fig. 2. Cerebellar scheme. Schematic representation of the main neural layers, cells, connections, and the plasticity site considered in the cerebellar model.
and MF groups was designed in a way that non-overlapped GC neural activation could univocally represent all possible MF neural input combinations. Importantly, this connectivity pattern facilitated the transformation of the sensorimotor neural information into a set of somatosensory neural activations that were easy to read out by the subsequent PC layer.
3) CFs (600) were modelled as input fibres able to propagate the instructive signal (mismatch between the actual and desired trajectories of each joint) towards PCs and DCN. These 600 fibres were organised into six micro-complexes of 100 neurons each, i.e. one per joint. Each micro-complex was also divided into two symmetrical subgroups, each one dedicated to control the clock/anticlockwise movement of the robot joint actuator (emulating the agonist/antagonist interplay in human muscles). A probabilistic Poisson process transformed the error obtained when comparing the actual and desired trajectories per joint into CF spiking neural activations. Each CF spike encoded well-timed information regarding the instantaneous error. The probabilistic spike sampling of the error ensured a proper representation of the whole error region over trials, whilst maintained the CF activity between 1 and 10 Hz per fibre (similar to electrophysiological data [47]). The error evolution could be sampled accurately even at such a low frequency [38, 48].
4) PCs (600) were modelled as LIF neurons. These 600 neurons were organised into six micro-complexes of 100 neurons each, i.e. one per joint. Each micro-complex was also divided into two symmetrical subgroups, each one dedicated to control the clock/anticlockwise movement of the robot joint actuator. Each PC was connected to all PFs, thus receiving the sensorimotor information concerning all joints at once. CFs and PCs were one-to-one connected maintaining the six-micro-complex architecture. Thus, each PC micro-complex received the same sensorimotor information via PFs, but a different instructive signal through its corresponding CFs micro-complex. Correlating these two different sources of neural information allows each PC micro-complex to adapt the cerebellar input-to-output response of each Baxter’s joint via a plasticity mechanism that modified the overall PF synaptic weight distribution (see synaptic plasticity subsection).
5) DCN (600) were modelled as LIF neurons. These 600 neurons were organised into six micro-complexes of 100 cells neurons each, i.e. one per joint. Each micro-complex was also divided into two symmetrical subgroups, each one dedicated to control the clock/anticlockwise movement of the robot joint actuator. Each DCN cell was innervated by an inhibitory afferent from a PC and an excitatory afferent from the CF which simultaneously innervated the same PC. Each DCN cell also received excitatory projections from all MFs (which maintained the baseline DCN activity). This neural topology has been summarised in Table I.
The DCN neural activity was then transformed into an analogue torque command (τcer) per micro-complex and then sent to Baxter`s actuators. This spike to analogue conversion was computed at each time step, Tstep = 0.002 s, using (5-7)
defines the DCN tag number within the micro-complex related to joint j (first 50 DCN cells encoding the agonist movement whereas last 50 DCN cells encoding the antagonist movement); and
stands for the Dirac delta function representing a spike event.
The spike to analogue conversion in (5) and (6) was then convolved with a mean filter (7) acting as a DCN activity eligibility trace; that is, a temporary record of the occurrence of DCN previous spike events. The fifteen-taps mean filter helped us to emulate the low-pass filter behaviour of muscles. The final torque output per joint was finally modulated by a factor to adequate the normalised DCN output to the joint relative position, orientation and mass;
= (0.75, 1.0, 0.375, 0.5, 0.05, 0.05) N·m/spike.
E. Spiking Neuron Models
The cerebellar neural network consisted of LIF neurons [49] due to their minimal computational cost in spike generation and processing, a key factor in RT computation. Our LIF neurons only elicited a spike once their corresponding membrane potential reached a certain threshold and, immediately after, their membrane potentials were reset. The LIF neural dynamics was just defined by its membrane potential and its excitatory (AMPA and NMDA) and inhibitory (GABA) chemical conductances as follows
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
where denotes the membrane capacitance; V is the membrane potential;
is the internal current and
the external current.
is the resting potential and
conductance responsible for the passive decay term towards the resting potential. Conductances
integrate all the contributions received by each receptor type (AMPA, NMDA, GABA) through individual synapses, being
the NMDA activation channel. These conductances were defined as decaying exponential functions [30, 49] where their values were directly incremented proportionally to the synaptic weights (
) upon each presynaptic spike arrival (Dirac delta functions). When the membrane potential reached a threshold (
), it was then reset to
during the refractory period (
). The configuration parameters for the three neurons modelled are shown in Table II.
F. Synaptic Plasticity
The adaptive motor process of the cerebellar network was implemented through a STDP mechanism located at PF-PC synapses. This STDP mechanism balanced long-term potentiation (LTP) and long-term depression (LTD) at PC synaptic level as follows
where denotes the synaptic weight change between the
PF and the target
= 0.002 nS is the synaptic efficacy increment;
is the Dirac delta function corresponding to an afferent spike from a PF;
= -0.001 nS is the synaptic efficacy decrement; and the kernel function k(x) is defined as
where = 100 ms is the time constant that is aligned with the biological sensorimotor pathway delay (~100 ms), the time period elapsed from the sensory information reception to, information transmission along nerve fibres, neural processing time responses and the final motor output response [50].
0,07 s allows for the adjustment of the kernel width. The kernel maximum value (k(x) = 1) is obtained when
and zero or close to zero when
. The STDP rule defined by (15-17) caused a fixed
synaptic efficacy increment (LTP) each time a spike arrived through the PFs to the target PC and a variable synaptic efficacy decrement (LTD) each time a spike arrived through a CF to the target PC. The amount of synaptic decrement depended on the activity arrived through the PFs prior to the CF spike. Both activities were convolved using the integrative kernel defined in (17) and were multiplied by the synaptic decrement . The effect on the presynaptic spikes arriving through PFs was maximal during the 100 ms time window (
= 100 ms) before the postsynaptic CF spike arrival, thus accounting for the sensorimotor pathway delay [38, 41, 51].
This STDP mechanism correlated the neural activity patterns coming through the PFs towards PCs with the instructive signals coming from CFs towards PCs. This correlation process at PC level identified certain PF activity patterns codifying certain sensorimotor information and, consequently, diminished the PC output activity by a PF-PC synaptic weight reduction. A reduction on the PC activation caused a subsequent reduction on the PC inhibitory action over the target DCN. Conversely, in the absence of any correlation, the STDP mechanism increased the PC output activity by a PF-PC synaptic weight potentiation. Since the DCN were driven by a near constant baseline MF activation, a lack of PC inhibitory action would cause an increasing DCN activity whereas an incremental PC inhibitory action would do otherwise. Well-timed sequences of increasing/decreasing levels of DCN activation during the learning acquisition process ultimately shaped the cerebellar output activity and diminished the overall error.
G. ROS modules implementation
The control loop consisted of three main elements: 1) trajectory generator, 2) cerebellar controller, and 3) Baxter robot. The implementation and communication amongst these three elements were developed using ROS, allowing modularity. Fig. 3 depicts the control loop diagram in which each block defines a ROS module and each black arrow represents a ROS topic that establishes the communication between ROS modules exchanging either analogue signals or spike trains.
This control loop was designed accounting for the sensorimotor pathway delay (~100 ms) [52]. The 100 ms delay comprised the efferent (= 50 ms) and afferent (
pathway delays (Fig. 3 dashed red arrows). A motor command originated at time t on the cerebellum was applied by the robot actuators at time
and its effect sensed back at the
We tested our cerebellar-like controller under different conditions, i.e. behavioural tasks, considering the default factory-installed position control mechanism as a performance baseline to validate the results. The aforementioned circular, eight-like and target reaching trajectories constituted our cerebellar benchmarking, which was completed with a set of interactions in an unstructured environment to test compliance.
This first behavioural task consisted in following a 120 mm radius circular path in the horizontal plane (xy) repeated over time to facilitate learning and adaptation, each trial having a time duration of 2 seconds. The STDP mechanism governing the learning process modulated the cerebellar output (see Methods), driving the robot’s behaviour towards the goal. The behavioural evolution through time is illustrated in Fig. 4. Three snapshots were taken at three different moments of the cerebellar learning process: initial, intermediate, and final stage. 1) Initial learning stage: The cerebellar-model started learning from scratch. At an initial learning stage [Fig. 4 (left column)] the synaptic adaptation mechanism at PFPC synapses that correlated the somatosensory information with the CF instructive signal was not effectively deployed yet. Thus, the inhibitory action from PCs onto DCN was of marginal utility; making the DCN output activity saturated as it solely responded to the excitation coming from MF and CF afferents [Fig. 4 (a), first row]. Consequently, the corresponding initial torque commands [Fig. 4 (a), second row] were far from leading the robot towards the desired goal [Fig. 4 (a), third row;
Fig. 3. Detailed cerebellar closed-loop control scheme.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
and (d)]. As depicted in Fig. 4 (d), the density function generated from 10 trials before t1 snapshot [Fig. 4 (left
column)] reveals that the robot was still exploring the working area, performing low consistent, dispersed movements. 2) Intermediate learning stage: At an intermediate learning stage [Fig. 4 (central column)], the synaptic adaptation allowed the recognition of some somatosensory patterns at the PCs, which was reflected in an emerging differentiated DCN activity between agonist and antagonist subgroups at each micro-complex [Fig. 4 (b), first row]. Consequently, the robot’s behaviour began getting closer to the desired goal [Fig. 4 (b), third row; and (e)]. 3) Final learning stage: Once the learning process reached advanced stages [Fig. 4 (right column)] the robot executed the desired trajectory with minimal error. The agonist/antagonist DCN activity was clearly differentiated at each micro-complex [Fig. 4 (c), first row], and translated into the required torque commands via a spike-to-analogue conversion (see Methods). The synaptic adaptation process was reflected in a clear evolution of the torque values compared to previous stages, directly affecting the robot output behaviour. All joints closely followed the desired trajectory at this stage [Fig. 4 (c), third row] and, consequently, the end-effector barely missed at describing the desired circular path [Fig. 4 (f)], having a consistent behaviour around the goal trajectory over trials. The overall performance through the learning process is depicted in Fig. 4 (g); illustrating how the cerebellar-like controller performance was improved as adaptation and learning were fulfilled. MAE evolution indicates that the cerebellar controller needed about 300 trials (i.e. 600 seconds) to converge, outperforming the accuracy of the default factory-installed position control baseline (0.019 ± 0.003 vs. 0.077 ± 0.0004, Table III).
discussed circle-shaped; it had a “radius” of 120 mm and each trial lasted 2 seconds. In terms of robot dynamics, the eight-like trajectory was more demanding than the circular trajectory, as faster and steeper changes in velocity module and direction were required for trajectory completion[24]. Nonetheless, the obtained results were equally satisfying (see Table III). 1) Initial learning stage: At an early learning stage [Fig. 5 (left column)] the robot’s behaviour was clearly far from the desired goal. DCN activity at this stage responded exclusively to the excitatory drive from MF-DCN and CF-DCN afferents, thus, it was saturated [Fig. 5 (a), first row]. The MAE value was high (0.165) and the performed trajectory was far from the goal [Fig. 5 (a), third row; (d), and (g)]. 2) As learning progressed, the PF-PC synaptic adaptation mechanism begun shaping the DCN activity causing an incipient neural activity differentiation between agonist and antagonist micro-complexes [Fig. 5 (b), first row]. In
Fig. 4. Behavioural evolution through circular trajectory trials (2 s). (a) Initial learning stage (t1=18-20 s). (b) Intermediate learning stage (t2=318-320 s). (c) Final learning stage (t3=998-1000 s). The first row depicts the cerebellar output activity (DCN layer), whereas the second row shows its analogue conversion into torque commands. The third row illustrates the desired vs. actual trajectory per joint. (d), (e), and (f) reveal the desired vs. actual trajectory of the end-effector in Cartesian space at t1, t2, and t3 respectively, along with the density functions corresponding to the performed trajectories of the prior 10 trials. (g) Represents the position Mean Absolute Error (MAE) per trial through the learning process. Comparison of the MAE of each joint and the mean of all joints with the default factory-installed position control baseline performance.
consequence, the corresponding torque values significantly differed from those of early stages [Fig. 5
(b), second row], and the robot’s behaviour began getting closer to the desired one [Fig. 5 (b), third row; and (e)]. 3) Finally, once learning was fully deployed the robot behaved as desired [Fig. 5 (c), third row; and (f)]. The DCN activity was clearly sculpted to produce the needed torque commands to perform the desired trajectory [Fig. 5 (c)], maintaining a stable behaviour over trials (0.017 ± 0.003). The greater difficulty of the eight-like trajectory was noted in a lower convergence speed for the cerebellar-like controller to reach a stable behaviour (Table III shows a slower MAE convergence speed than the circular trajectory). However, the final performance accuracy obtained also outperformed the default factory-installed position control baseline (0.017 ± 0.003 vs. 0.063 ± 0.0003).
C. Target Reaching
This task consisted of eight different reaching movements, sharing the same starting point. The challenge lied in the high speed of the movements and the randomness in the order of trials (transitions between the eight reaching movements were stochastic). The growth in complexity for the cerebellar controller was illustrated by a lower MAE convergence speed entailing higher standard deviation values inter trials and the need of more trials to reach stability than in the two previous behavioural tasks (Table IV). Nevertheless, the cerebellar-like controller was able to perform these ballistic movements, improving its performance through learning and reaching again better accuracy than the default factory-installed position control mechanism [Fig. 6] (0.019 ± 0.006 vs. 0.026 ± 0.006).
Therefore, not only the cerebellar-like controller was able to perform accurate smooth trajectories but also fast-ballistic movements.
D. Unstructured interactions
Aiming at testing the compliance of the cerebellar controller, we tested its response in an unstructured environment. Whilst performing the circular trajectory, some interactions were undertaken [Fig. 7]. First, the dynamics of the robotic arm was modified in two different ways: i) By adding a 0,5kg payload to the end-effector attached to a rod, mimicking a pseudo “conical pendulum”. The tension force of
Fig. 5. Behavioural evolution through eight-like trajectory trials (2 s). (a) Initial learning stage (t1=18-20 s). (b) Intermediate learning stage (t2=318-320 s). (c) Final learning stage (t3=998-1000 s). The first row depicts the cerebellar output activity (DCN layer), whereas the second row shows its analogue conversion into torque commands. The third row illustrates the desired vs. actual trajectory per joint. (d), (e), and (f) reveal the desired vs. actual trajectory of the end-effector in Cartesian space at t1, t2, and t3 respectively. Also the density functions corresponding to the prior 10 trials are depicted. (g) Represents the position Mean Absolute Error (MAE) per trial through the learning process. The MAE of each joint is illustrated as well as the average MAE of all joints, completed with the default factory-installed position control baseline performance. TABLE III CIRCULAR AND EIGHT-LIKE TRAJECTORIES: LEARNING STAGES MAE TABLE IV
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
the rod acting on the robot varied with the alignment between the payload and the end-effector. ii) By attaching an elastic band to apply an elastic force that tried to return the band to its natural length. In both cases, the cerebellar-like controller successfully adapted to the new context after a learning period.
Subsequently, human interactions were performed: i) A human was able to move the robotic arm by applying an extremely low force (i.e. one-finger push). ii) A human grabbed the robotic arm and moved it through the working space with no opposition from the robot. iii) A human got in the way of the robotic arm trajectory with no risk for injury.
These results allow us to confirm that the cerebellar-like controller was able to accurately perform the desired trajectories, no matter the dynamics modifications; and guaranteed a safe human-robot interaction as no damages were suffered when interrupting the robot’s task, at either human or robot side.
Four movies are included as supplementary material to fully illustrate the cerebellar learning and adaptation process. The target reaching, eight-like, and circular trajectory movies show from up to down and left to right the following clips, all of them playing synchronised RT information: i) a frontal shot of the robot performing the trajectory; ii) the evolution of the position MAE per trial; iii) a nadir shot of the robot performing the trajectory; iv) the trajectory being performed by the end-effector in Cartesian space; v) the cerebellar output activity (DCN layer spikes); vi) the corresponding torque
commands obtained from the spike-to-analogue conversion of the DCN activity. Different cuts corresponding to an initial, intermediate, and final learning stage verify the behavioural evolution.
Finally, the unstructured environment movie shows the cerebellar adaptation and, therefore, robot adaptation, to unknown, unstructured scenarios; thus, proving compliance.
Physical HRI implies controlling nonlinearities at the robotic end, thus demanding adaptive control. In this work, taking biology as an inspiration, we expand the family of RT adaptive robot controllers beyond machine learning [54], fuzzy logic [55, 56] and ANNs [9, 57] solutions. We present a novel biologically plausible motor control architecture with a cerebellar-like SNN controller at its core that is able to drive a 6 DoF robot via torque commands in RT.
The intrinsic characteristics of SNNs, i.e. timing codification of evolving sensorimotor states, make them an appealing approach for motor control architectures [11, 12]. However, computational cost has been the major drawback for implementing RT SNN controllers [58]; constraining their applicability to little versatile hardware solutions [59, 60], simulated scenarios [56, 58], or RT with low resolution control signals [61].
Here, this main issue has been overcome; a ~62 k neuron sized SNN, endowed with plasticity (36M plastic synapses), has been proven a valid RT robot controller. The implemented cerebellar plasticity mechanism (STDP) turns dispensable the availability of a detailed dynamic model of the robot. The cerebellar-like SNN is able to self-adapt and learn from scratch to control a given robot, making unnecessary any prior dynamics knowledge. Thus, the complexity of modelling nonlinear systems is tackled, and this SNN controller constitutes a plausible solution to control not only our Baxter robot, but any torque controlled robot. Previously achieved SNN position control [61, 62] does not provide compliance as physical perturbations/interactions are not supported; hence, the importance of reliable torque control towards achieving safe physical HRI.
Fig. 7. Performance in an unstructured environment. Whilst performing the already learnt circular trajectory a set of unstructured interactions were undertaken: i) A ½ kg payload was attached to the end-effector and later on detached. ii) An elastic band was attached to the end-effector and later on detached. iii) A series of physical Human-Robot interactions. The figure depicts the position MAE through trials as interactions are undertaken, illustrating the cerebellar adaptation to unknown scenarios. Fig. 6. Behavioural evolution through target reaching trials (2 s). Each trial consisted of one of the eight possible tasks. (a) Initial learning stage (t1=158-160 s). (b) Intermediate learning stage (t2=598-600 s). (c) Final learning stage (t3=1998-2000 s). (a), (b), and (c) depict the last performed trajectory for each of the eight possibilities in Cartesian space prior to t1, t2, and t3 respectively. The density functions reveal the end-effector behaviour over the last 80 trials, grouping the eight possible tasks by trajectory direction. (d) Represents the position Mean Absolute Error (MAE) per trial through the learning process. The MAE of each joint is illustrated as well as the mean MAE of all joints. High standard deviation values reflect how some reaching movements were more demanding than others. The position control baseline is the average MAE of the default factory-installed under the same stochastic distribution over trials.
[18] E. Salinas and R. Romo, "Conversion of sensory signals into motor commands in primary motor cortex," J. Neurosci., vol. 18, pp. 499-511, 1998.
[19] J. J. Craig, "Force control of manipulators," in Introduction to robotics: mechanics and control. vol. 3, 3 ed: Pearson/Prentice Hall Upper Saddle River, NJ, USA:, 2005, pp. 317-38.
[20] A. Karniel and G. F. Inbar, "A model for learning human reaching movements," Biol. Cybern., vol. 77, pp. 173-83, 1997.
[21] N. Schweighofer, M. A. Arbib, and M. Kawato, "Role of the cerebellum in reaching movements in human. I. Distributed Inverse dynamics control," Eur. J. Neurosci. , vol. 10, pp. 86-94, 1998.
[22] N. R. Luque, J. A. Garrido, R. R. Carrillo, E. D'Angelo, and E. Ros, "Fast convergence of learning requires plasticity between inferior olive and deep cerebellar nuclei in a manipulation task: a closed-loop robotic simulation," Front. Comp. Neurosci., vol. 8, p. 97, 2014.
[23] P. van der Smagt, "Benchmarking cerebellar control," Robot. Auto. Syst., vol. 32, pp. 237-51, 2000.
[24] H. Hoffmann, G. Petckos, S. Bitzer, and S. . Vijayakumar, "Sensor- assisted adaptive motor control under continuously varying context," in ICINCO,Automation&Robotics, 2007.
[25] S. Chitta, I. Sucan, and S. Cousins, "Moveit![ROS topics]," IEEE Robotics & Automation Magazine, vol. 19, pp. 18-9, 2012.
[26] C. Fitzgerald, "Developing baxter," in 2013 IEEE Conf. Tech. Practical Robot App. (TePRA), 2013, pp. 1-6.
[27] M. M. Williamson, "Series elastic actuators," 1995.
[28] N. P. Koenig and A. Howard, "Design and use paradigms for Gazebo, an open-source multi-robot simulator," in IROS, 2004, pp. 2149-2154.
[29] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, et al., "ROS: an open-source Robot Operating System," in ICRA workshop on open source software, 2009, p. 5.
[30] E. Ros, R. Carrillo, E. M. Ortigosa, B. Barbour, and R. Agís, "Event- driven simulation scheme for spiking neural networks using lookup tables to characterize neuronal dynamics," Neural Comp., vol. 18, pp. 2959-93, 2006.
[31] F. Naveros, N. R. Luque, J. A. Garrido, R. R. Carrillo, M. Anguita, and E. Ros, "A spiking neural simulator integrating event-driven and time-driven computation schemes using parallel CPU-GPU co-processing: a case study," IEEE T Neur. Net. Lear., vol. 26, pp. 1567-74, 2015.
[32] F. Naveros, J. A. Garrido, R. R. Carrillo, E. Ros, and N. R. Luque, "Event-and time-driven techniques using parallel CPU-GPU co-processing for spiking neural networks," Front. Neuroinf., vol. 11, p. 7, 2017.
[33] N. R. Luque, R. R. Carrillo, F. Naveros, J. A. Garrido, and M. Sáez-Lara, "Integrated neural and robotic simulations. Simulation of cerebellar neurobiological substrate for an object-oriented dynamic model abstraction process," Robot. Auto. Syst., vol. 62, pp. 1702-16, 2014.
[34] F. Naveros, N. R. Luque, E. Ros, and A. Arleo, "VOR Adaptation on a Humanoid iCub Robot using a Spiking Cerebellar Model," IEEE Trans. Cybern. , vol. Accepted for publication, 2019.
[35] N. R. Luque, J. A. Garrido, J. Ralli, J. J. Laredo, and E. Ros, "From Sensors to Spikes: Evolving Receptive Fields to Enhance Sensorimotor Information in a Robot-Arm," Int. J. Neural Syst., vol. 22, p. 1250013, 2012/08/01 2012.
[36] J. A. Garrido Alcazar, N. R. Luque, E. D‘Angelo, and E. Ros, "Distributed cerebellar plasticity implements adaptable gain control in a manipulation task: a closed-loop robotic simulation," Front. Neural Circuits, vol. 7, p. 159, 2013.
[37] J. A. Garrido, N. R. Luque, S. Tolu, and E. D’Angelo, "Oscillationdriven spike-timing dependent plasticity allows multiple overlapping pattern recognition in inhibitory interneuron networks," Int. J. Neural Syst., vol. 26, p. 1650020, 2016.
[38] N. R. Luque, J. A. Garrido, R. R. Carrillo, J.-M. D. C. Olivier, and E. Ros, "Cerebellarlike corrective model inference engine for manipulation tasks," IEEE T. Syst. Man Cy. B, vol. 41, pp. 1299-12, 2011.
[39] J. A. Garrido, E. Ros, and E. D‘Angelo, "Spike timing regulation on the millisecond scale by distributed synaptic plasticity at the cerebellum input stage: a simulation study," Front. Comp. Neurosci., vol. 7, p. 64, 2013.
[40] N. R. Luque, F. Naveros, R. R. Carrillo, E. Ros, and A. Arleo, "Spike burst-pause dynamics of Purkinje cells regulate sensorimotor adaptation," PLOS Comp. Biol., vol. 15, p. e1006298, 2019.
[41] N. R. Luque, J. A. Garrido, R. R. Carrillo, S. Tolu, and E. Ros, "Adaptive cerebellar spiking model embedded in the control loop: context switching and robustness against noise," Int. J. Neural Syst., vol. 21, pp. 385-401, 2011.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
[42] M. Ito, "Cerebellar microcomplexes," in Inter. Review Neurobiol. vol. 41, ed: Elsevier, 1997, pp. 475-87.
[43] R. R. Carrillo, F. Naveros, E. Ros, and N. R. Luque, "A Metric for Evaluating Neural Input Representation in Supervised Learning Networks," Front. Neurosci., vol. 12, 2018-December-14 2018.
[44] T. Yamazaki and S. Tanaka, "Computational models of timing mechanisms in the cerebellar granular layer," Cerebellum, vol. 8, pp. 423-32, 2009.
[45] T. Honda, T. Yamazaki, S. Tanaka, S. Nagao, and T. Nishino, "Stimulus- dependent state transition between synchronized oscillation and randomly repetitive burst in a model cerebellar granular layer," PLoS Comp. Biol, vol. 7, p. e1002087, 2011.
[46] T. Ishikawa, M. Shimuta, and M. Häusser, "Multimodal sensory integration in single cerebellar granule cells in vivo," eLife, vol. 4, p. e12916, 2015/12/29 2015.
[47] S. Kuroda, K. Yamamoto, H. Miyamoto, K. Doya, and M. Kawato, "Statistical characteristics of climbing fiber spikes necessary for efficient cerebellar learning," Biol. Cybern., vol. 84, pp. 183-92, 2001.
[48] R. R. Carrillo, E. Ros, C. Boucheny, and J.-M. C. Olivier, "A real-time spiking cerebellum model for learning robot control," Biosystems, vol. 94, pp. 18-27, 2008.
[49] W. Gerstner and W. M. Kistler, Spiking neuron models: Single neurons, populations, plasticity: Cambridge university press, 2002.
[50] A. Sargolzaei, M. Abdelghani, K. K. Yen, and S. Sargolzaei, "Sensorimotor control: computing the immediate future from the delayed present," BMC bioinformatics, vol. 17, p. 245, 2016.
[51] M. Kawato and H. Gomi, "A computational model of four regions of the cerebellum based on feedback-error learning," Biol. Cybern., vol. 68, pp. 95-103, 1992.
[52] M. Gerwig, K. Hajjar, A. Dimitrova, M. Maschke, F. P. Kolb, M. Frings, et al., "Timing of conditioned eyeblink responses is impaired in cerebellar patients," J. Neurosci., vol. 25, pp. 3919-31, 2005.
[53] D. L. Mills, "Internet time synchronization: the network time protocol," IEEE T. Comm., vol. 39, pp. 1482-93, 1991.
[54] S. Wang, W. Chaovalitwongse, and R. Babuska, "Machine learning algorithms in bipedal robot control," IEEE T Syst. Man Cy. C, vol. 42, pp. 728-43, 2012.
[55] J. P. Hwang and E. Kim, "Robust tracking control of an electrically driven robot: adaptive fuzzy logic approach," IEEE T. Fuzzy Syst., vol. 14, pp. 232-47, 2006.
[56] Z. Bing, C. Meschede, K. Huang, G. Chen, F. Rohrbein, M. Akl, et al., "End to end learning of spiking neural network based on r-stdp for a lane keeping vehicle," in 2018 IEEE Int. Conf. Robotics and Automation (ICRA), 2018, pp. 1-8.
[57] D. D. Ligutan, A. C. Abad, and E. P. Dadios, "Adaptive Robotic Arm Control using Artificial Neural Network," in 2018 IEEE Int. Conf. Human. Nanotechnol. Inf. Tech. Comm. Control, Environment and Manag. (HNICEM), 2018, pp. 1-6.
[58] A. Bouganis and M. Shanahan, "Training a spiking neural network to control a 4-dof robotic arm based on spike timing-dependent plasticity," in 2010 Int. Joint Conf. Neural Networks (IJCNN), 2010, pp. 1-8.
[59] S. B. Furber, F. Galluppi, S. Temple, and L. A. Plana, "The spinnaker project," Proc. IEEE, vol. 102, pp. 652-65, 2014.
[60] M. Hulea and C. F. Caruntu, "Spiking neural network for controlling the artificial muscles of a humanoid robotic arm," in 2014 Int. Conf. Syst. Theory, Control Comp. (ICSTCC), 2014, pp. 163-8.
[61] A. Antonietti, D. Martina, C. Casellato, E. D’Angelo, and A. Pedrocchi, "Control of a Humanoid NAO Robot by an Adaptive Bioinspired Cerebellar Module in 3D Motion Tasks," Comp. Intel. Neurosci., vol. 2019, 2019.
[62] R. de Azambuja, A. Cangelosi, and S. V. Adams, "Diverse, noisy and
and secondary education teaching from the U. Granada (Spain) in 2015 and 2016 respectively. In 2018, he joined the Applied Computational Neuroscience Research Group of the U. Granada (ACN-UGR) awarded with a Young
Researchers fellowship. His main research interests include neuromorphic engineering, spiking neural networks, braincomputer interfaces and motor control.
Francisco Naveros received two M.Sc. degrees, in telecommunication, and in computer science and networks in 2011, 2012 respectively. He also holds a Ph.D. degree in computational neuroscience from the U. Granada (Spain), 2017. He has been a postdoctoral researcher since 2017 at ACN-UGR. He is the author of 8 articles. His main research interests include biologically processing control schemes, parallel and real-time spiking neural network and lightweight robots.
earned his M.Sc. degree in computer sciences in 2006 and his M.Sc. and PhD degree in computer engineering and networks in 2007 and 2011 respectively, all from the U. Granada (Spain). From 2012 to 2015, he joined the Brain and Behavioral Science department at U. Pavia (Italy) under supervision of Prof. D’Angelo. In 2015, he was awarded with a Young Researchers Fellowship by U. Granada. From 2016 to 2019, he obtained an IF Marie Curie Post-Doc Fellowship from the EU in the ACN-UGR. He is the author of more than 25 articles. His main research interests include cerebellar information processing and learning, motor control, neuromorphic engineering, and spiking neural networks
Eduardo Ros received his M.Sc. and Ph.D. degrees in physics and computational neuroscience from the U. Granada (Spain) in 1992 and 1997 respectively. He is currently Full Professor in the Dept. of Computer Architecture and Technology of U. Granada. He is the head of the ACN-UGR group. He is the author of more than 85 scientific articles. His main research interests include bio-inspired processing, neuromorphic engineering, spiking neural networks and computational neuroscience.
Niceto R. Luque was awarded his M.Sc. and Ph.D. degrees in computer science and networks from U. Granada (Spain) in 2007 and 2013 respectively. He also received a B.Sc in electronics and a M.Sc. in automatics and industrial electronics from U. Córdoba (Spain) in 2003 and 2006, respectively. From 2015 to 2017, he obtained an IF Marie Curie fellowship from the EU in Dr. Arleo’s lab in Paris. In 2018 he obtained a Juan de la Cierva Incorporation Post-Doc fellowship from the Spanish Government in the ACN-UGR. He is the author of more than 20 articles. His main research interests include biologically processing control, spiking neural networks and ageing.
Fig. 1. Schematic of the Cerebellar closed-loop control. The Mossy fibres (MFs) convey the sensory signals, whilst the climbing fibres (CFs) convey the instructive signals, thus providing the inputs to the cerebellar network. The deep cerebellar nuclei (DCN) drive the cerebellar torque output commands. MFs project sensorimotor information onto granular cells (GCs) and DCN. GCs, in turn, project onto Purkinje cells (PCs) through parallel fibres (PFs). PCs also receive excitatory inputs from the CFs. Finally, DCN receive excitatory inputs from the MFs and CFs and inhibitory inputs from the PCs.
Fig. 2. Cerebellar scheme. Schematic representation of the main neural layers, cells, connections, and the plasticity site considered in the cerebellar model.
Fig. 4. Behavioural evolution through circular trajectory trials (2 s). (a) Initial learning stage (t1=18-20 s). (b) Intermediate learning stage (t2=318-320 s). (c) Final learning stage (t3=998-1000 s). The first row depicts the cerebellar output activity (DCN layer), whereas the second row shows its analogue conversion into torque commands. The third row illustrates the desired vs. actual trajectory per joint. (d), (e), and (f) reveal the desired vs. actual trajectory of the end-effector in Cartesian space at t1, t2, and t3 respectively, along with the density functions corresponding to the performed trajectories of the prior 10 trials. (g) Represents the position Mean Absolute Error (MAE) per trial through the learning process. Comparison of the MAE of each joint and the mean of all joints with the default factory-installed position control baseline performance.
Fig. 5. Behavioural evolution through eight-like trajectory trials (2 s). (a) Initial learning stage (t1=18-20 s). (b) Intermediate learning stage (t2=318-320 s). (c) Final learning stage (t3=998-1000 s). The first row depicts the cerebellar output activity (DCN layer), whereas the second row shows its analogue conversion into torque commands. The third row illustrates the desired vs. actual trajectory per joint. (d), (e), and (f) reveal the desired vs. actual trajectory of the end-effector in Cartesian space at t1, t2, and t3 respectively. Also the density functions corresponding to the prior 10 trials are depicted. (g) Represents the position Mean Absolute Error (MAE) per trial through the learning process. The MAE of each joint is illustrated as well as the average MAE of all joints, completed with the default factory-installed position control baseline performance.
Fig. 6. Behavioural evolution through target reaching trials (2 s). Each trial consisted of one of the eight possible tasks. (a) Initial learning stage (t1=158-160 s). (b) Intermediate learning stage (t2=598-600 s). (c) Final learning stage (t3=1998-2000 s). (a), (b), and (c) depict the last performed trajectory for each of the eight possibilities in Cartesian space prior to t1, t2, and t3 respectively. The density functions reveal the end-effector behaviour over the last 80 trials, grouping the eight possible tasks by trajectory direction. (d) Represents the position Mean Absolute Error (MAE) per trial through the learning process. The MAE of each joint is illustrated as well as the mean MAE of all joints. High standard deviation values reflect how some reaching movements were more demanding than others. The position control baseline is the average MAE of the default factory-installed under the same stochastic distribution over trials.
Fig. 7. Performance in an unstructured environment. Whilst performing the already learnt circular trajectory a set of unstructured interactions were undertaken: i) A ½ kg payload was attached to the end-effector and later on detached. ii) An elastic band was attached to the end-effector and later on detached. iii) A series of physical Human-Robot interactions. The figure depicts the position MAE through trials as interactions are undertaken, illustrating the cerebellar adaptation to unknown scenarios.