Learning Sensorimotor Agency
in Cellular Automata
Finding robust self-organizing "agents" with gradient descent and curriculum learning: individuality, self-maintenance and sensori-motricity within a cellular automaton environment
What you will find in this blog 📝
Abstract
Novel classes of Cellular Automata (CA) have been recently introduced in the Artificial Life (ALife) community, able to generate a high diversity of complex self-organized patterns from local update rules. These patterns can display certain properties of biological systems such as a spatially localized organization, directional or rotational movements, etc. In fact, CA have a long relationship with biology and especially the origins of life/cognition as it is a self-organizing system that can serve as a computational testbed and toy model for such theories
Interactive Demo

Zoomed in

Multi creature

Maze

Zoomed in

Multi creature

Maze

Zoomed in

Multi creature

Maze
Brush size:
Radius of kernels (size of creature):
Introduction: Connecting the dots between the Mechanistic and Enactivist views of Cognition.
Understanding what has led to the emergence of life, cognition and natural agency as we observe in living organisms is one of the major scientific quests in a variety of disciplines ranging from biology and chemistry to evolutionary science. The pragmatic complementary question, central in disciplines such as artificial life (ALife) and artificial intelligence (AI), is: can we engineer the necessary ingredients to discover forms of functional life and cognition as-it-could-be in an artificial substrata? With respect to this practical goal of building agents-as-they-could-be, key challenges are the modeling of an artificial environment
In the mechanistic view, robots and other virtual agents are referred as "embodied" if they can ground their sensorimotor capabilities in the environment (the external world) via a physical interface (the body) allowing to experience the world directly (sensory inputs) and to act upon it (motor outputs) using internal input-output information processing (the brain). Embodiment here is opposed to the computational non-embodied perspective where internal representations, either symbolic-based in "good-old fashion AI" or neural-network-based in the "internet AI", are decoupled from the external world and lack situatedness
The clear body/brain/environment distinction of the mechanistic framework bears little resemblances with biological examples of brainless organisms using their body both for sensing and computing the decision. Plants move to get more sun, slime molds use mechanical cues of their environment to choose in which direction to expand
There is no predefined notion of agent embodiment, instead it is considered that the body of the agent must come to existence through the coordination of the low-level elements and must operate under precarious conditions
The enactive view on embodiment however is rooted in the bottom-up organizational principles of living organisms in the biological world. The modeling framework typically uses tools from dynamical and complex systems theory where an artificial system (the environment) is made of low-level elements of matter \(\{a_i\}\) (called atoms, molecules or cells) described by their inner states (e.g. energy level) and locally interacting via physics-like rules (flow of matter and energy within the elements). There is no predefined notion of agent embodiment, instead it is considered that the body of the agent must come to existence through the coordination of the low-level elements and must operate under precarious conditions
Whereas both the mechanistic and the enactivist framework agree on agents as entities with some form of goal-directedness and action response to external perturbations, we can see how the characterization of agents in self-organizing systems is non-intuitive and very challenging in practice. Some recent works have proposed rigorous quantitative measures of individuality based on information theory tools
Is it possible to find environments in which a subpart could exist/emerge and be called a "sensorimotor agent"?
In the work presented here, following the enactivist modeling framework, we initially only assume environments made of atomic elements and physical laws and try to answer the following scientific question: is it possible to find environments in which a subpart could exist/emerge and be called a "sensorimotor agent"? To do so, we use a continuous cellular automaton, called Lenia
In the first section, we explain how we made the Lenia framework as differentiable-friendly as possible in order to efficiently search for CA rules. The transition toward differentiable dynamics was recently proposed in the context of cellular automata
In the second section, we propose a method based on gradient descent and curriculum learning combined within an intrinsically-motivated goal exploration process (IMGEP
In the third section, we explain how our environment's physical rules can integrate both predetermined specific properties and learnable properties. That "trick" to control subparts of the environmental physics allows us to build a curriculum of tasks for optimizing the learnable part of the environment, in which we are searching parameters that could self-organize sensorimotor agents robust to stochastic variations in the environmental constraints.Environment-design allows us, by shaping the search process, to discover more advanced forms of sensorimotor capabilities such as self-maintenance and adaptivity to the surroundings.
Finally in the last section, we investigate the (zero-shot) generalization of the discovered sensorimotor agents to several out-of-distribution perturbations that were not encountered during training. Impressively, even though the agents still fail to preserve their integrity in certain configurations, they show very strong robustness to most of the tested variations. The agents are able to navigate in unseen and harder environmental configurations while self-maintaining their individuality. Not only the agents are able to recover their individuality when subjected to external perturbations but also when subjected to internal perturbations: they resist variations of the morphogenetic processes such that less frequent cell updates, quite drastic changes of scales as well as changes of initializations. Furthermore, when tested in a multi-entity initialization and despite having been trained alone, not only the agents are able to preserve their individuality but they show forms of coordinated interactions (attractiveness and reproduction), interactions that have been coined as communicative interactions
Searching for rules at the cell-level in order to give rise to higher-level cognitive processes at the level of the organism and at the level of the group of organisms opens many exciting opportunities to the development of embodied approaches in AI in general.
Our results suggest that, contrary to the (still predominant) mechanistic view on embodiment, biologically-inspired enactive embodiment could pave the way toward agents with strong coherence and generalization to out-of-distribution changes, mimicking the remarkable robustness of living systems to maintain specific functions despite environmental and body perturbations
The system
Cellular automata are, in their classic form, a grid of "cells" \( A = \{ a_x \} \) that evolve through time \( A^{t=1} \rightarrow \dots \rightarrow A^{t=T} \) via local "physics-like" laws. More precisely, the cells sequentially update their state based on the states of their neighbours: \( a_x^{t+1}= f(a_x^t,\mathcal{N}(a_x^t))\), where \( x \in \mathcal{X}\) is the position of the cell on the grid, \(a_x \) is the state of the cell, and \(\mathcal{N}(a_x^t)\) is the neighbourhood of the cell. The dynamic of the CA is thus entirely defined by the initialization \( A^{t=1} \) (initial state of the cells in the grid) and the update rule \( f \) (how a cell updates based on its neighbours). But predicting their long term behavior is a difficult challenge even for simple ones due to their chaotic dynamics.
SLP
Moving pattern
The Game of Life is one example of cellular automaton with binary states where cells can either be dead (\(a_x=0\)) or alive (\(a_x=1\)). Despite its very simple rule \(f\), very complex structures can emerge in it. One main type of pattern studied in the game of life is stable spatially localized patterns (SLP) : patterns with a kind of spatial boundary that separates the unity from the rest. The subcategory of moving patterns, SLP that periodically get back to their state after some timesteps but shifted in space, is of particular interest. The well-known glider, as shown on the right, was even proposed as a computational model of an autopoietic system
The cellular automaton we study in this work is Lenia
A wide variety of complex patterns has been found in Lenia, using a combination of hand made exploration/mutation and evolutionary algorithm
Creatures obtained by hand made random exploration
Creatures at the top areorbiums in Lenia
Finding creatures like these can take time (and expert knowledge) especially for more complicated ones.
You can find a library of creatures found in the first version of Lenia (
The moving creatures found are long term stable and can have interesting interactions with each other but some as the orbium (which you can find on the 2 upper videos) are not very robust for example here with collision between each other. Other more complex creatures (as shown in the two bottom videos) seem to resist collision better and to be able to sense the other creatures. These creatures show sensorimotor capabilities as they change direction in response to interaction with other creatures.
However, all of the previous methods use only random mutations and manual tuning to find these patterns, which can be computationally heavy especially to find very specific functionalities or in high dimensional parameter space. This motivates our choice to make Lenia differentiable, which then allows us to take advantage of the differentiability to find, in a more efficient and systematic way, the parameters leading to the emergence of agents with similar types of behaviors .
In this section, we first explain in more detail the model in Lenia and present how we made parts of the Lenia model differentiable.
Lenia
In Lenia , the system is composed of several communicating grids \( A=\{ A_c\}\) which we call channels. The above video illustrates Lenia "physics" in a 2-channel example ( \(A_1\) is colored in yellow and \( A_2 \) in blue). In each of these grids, every cell/pixel can take any value between 0 and 1. Cells at 0 are considered dead while others are alive. As shown in the video, the channels are updated in parallel according to their own physics rule. Intuitively, we can see channels as the domain of existence of a certain type of cell. Each type of cell has its own physics : it has its own way to interact with other cells of its type (intra-channel influence) and also its own way to interact with cells of other types (cross-channel influence).
The update of a cell \( a_{x,c}\) at position \(x\) in channel \(c\) can be decomposed in three steps. First the cell senses its neighbourhood in some other channels (its neighbourhood in its channel, with cells of the same type but also in other channels with other types of cells) through convolution kernels which are filters \(K_k\) of different shapes and sizes. Second, the cell converts this sensing into an update (whether positive or negative growth or neutral) through growth functions \(G_k\) associated with the kernels. Finally, the cell modifies its state by summing the scalars obtained after the growth functions and adding it to its current state. After the update of every rule has been applied, the state is clipped between 0 and 1. Each (kernel,growth function) couple is associated to the source channel \(c_s\) it senses, and to the target channel \(c_t\) it updates. A couple (kernel,growth function) characterizes a rule on how a type of cell \(c_t\) reacts to its neighbourhood of cells of type \(c_s\). Note that \(c_s\) and \(c_t\) could be the same, which correspond to interaction of cells of the same type (intra-channel influence). Note also that we can have several rules characterizing the interaction between \(c_s\) and \(c_t\), i.e. \(n(c_s \rightarrow c_t)\) (kernel,growth function) couples.
A local update in the grid is summarized with the following formula:
$$a_x^{t+1}=f(a_x^t, \mathcal{N}(a_x^t)) = \begin{bmatrix} a^t_{x,c_0} + \sum_{c_s=0}^C \sum_{k=0}^{n(c_s \rightarrow c_0)-1} G^k_{c_s \rightarrow c_0} ( K^k_{c_s \rightarrow c_0} (a^t_{x,c_0}, \mathcal{N}_{c_0}(a^t_x))) \\.\\.\\.\\a^t_{x,c_C} + \sum_{c_s=0}^C \sum_{k=0}^{n(c_s \rightarrow c_C)-1} G^k_{c_s \rightarrow c_C} ( K^k_{c_s \rightarrow c_C} (a^t_{x,c_C}, \mathcal{N}_{c_C}(a^t_x)))\end{bmatrix} $$
For each rule, the shape of the (kernel, growth function) is parametrized. We are thus able to "tune" the physics of the cells and of their interactions by changing the kernels shape (how the cells perceive their neighborhood) as well as the growth function shape (how the cells react to this perception).
However, finding interesting parameters leading to the emergence of localized patterns or even moving one is not easy. For example here is a random search of 100 trials (with 1 channel and 10 rules) gives only 3-5 SLP and no moving creature. Even with more advanced diversity-driven exploration searches, moving creatures are hard to find
Random exploration of the parameter space rarely results in moving creature.
Each 100 squares are random parameters trials (each 1 channel and 10 rules so \(\sim \) 130 parameters for all the rules of a square). The parameters
control the local rules of interaction between cells, by changing the kernel shape (how a cell senses) and the growth function (how this sensing
is converted into growth).
Differentiable Lenia
Due to the locality and recurrence of the update rule, there is a close relationship between cellular automata and recurrent convolutional networks
However, in the classic version of Lenia
How to discover spatially localized and moving agents in Lenia ?
In this section, we propose tools based on gradient descent and curriculum learning to learn the CA parameters leading to the emergence of moving creatures. Finding gliders-like creatures will be the basis on which we'll build the method leading to the emergence of sensorimotor capabilities within the Lenia dynamics in the next section.
In this study, we learn to develop morphology and motricity at the same time. The CA rule will both be applied to grow the creature from an initial state and be the "physics" that makes it move.
Note that moving creatures in cellular automaton differ from other types of movement like motors, muscle contraction or soft robot
In this section, we only work with 1 channel (only 1 type of cells interacting). In this channel we want to learn the several rules (parameters of the kernel+growth functions encoding the interactions of those cells within the channel) that will result in the emergence of moving creatures when given the proper initialization. At the same time we also aim to learn an initialization that will be adapted to these rules i.e. lead to the emergence of the creature.
We start by randomly initializing the parameters and initialization until we get a localized pattern, meaning it doesn't spread to the whole grid and doesn't die . This obtained pattern will most of the time stay at the same position. What we want is to change the parameters and initialization such that this pattern ends further in the grid, meaning that it survived and stayed localized but moved to a different location in a few timesteps. Because our system is differentiable, we're able to backpropagate through the timesteps by "unfolding" the roll-out. We therefore need a loss applied on the roll-out which will encourage movement to a new position z.
Schematic view of optimization step
Differentiable Lenia allows to optimize the CA parameters such that its dynamics converges
towards a target pattern. Here the figure shows optimization of the system with MSE error between target image and system state at last timestep.
Different training losses could be envisaged and applied at intermediate time steps depending on the dynamical
properties one aim to emerge in the system
.
Target image with MSE error applied at the last timestep of a rollout seems effective to learn CA rule leading to a certain pattern
The first target shape we tried was a single disk with the idea of getting a spatially localized agent contained within the disk as the target shape is finite. However, after seeing that the robust creature obtained seemed to have a "core" and a shallow envelope, we informally chose to move to two superposed discs, a large shallow one with a thick smaller one on top. The resulting target shape has the formula \(0.1*(R<1)+0.8*(R<0.5)\). We chose on purpose to have the sum to be smaller than 1 to avoid killing the gradient due to the clip operation.
Despite the choice of the target shape, the choice of the target location is crucial for the success of the optimization. Simply putting a target shape far from initialization and optimizing towards it does not work most of the time. In fact, it works only when the target is not too far ( more precisely overlaps a little bit) from where the creature ended before optimization. This comes from the fact that cells at 0 do not give gradients as we clip between 0 and 1 and so, for example, if at last timestep the target shape is on an area where the cell states are clipped at 0 no gradient will be propagated. Moreover, as the system is complex/chaotic, the optimization landscape will be very hard and changing some parameters too much can easily break the dynamic leading to completely different outcomes (loosing all the progress and making further optimization very hard). And so putting the target at a close position should lead to easier optimization landscape as well as more gradient information (because more non clipped pixel will overlap the target), leading to better optimization steps with less chances to diverge. To use these small steps, we propose to use curriculum learning exploiting the fact that we can shape optimization to aim for near-enough (and increasingly further) target shapes.
Result of Optimization step
Red : target, yellow : initialization and green : agent at last timestep.
Left to right is one optimization step. The agent learns to go a little bit further in the same amount of time.
.
Curriculum-driven goal exploration process
The effectiveness of curriculum with complex tasks has already been shown in Wang et al.
However, defining a learning curriculum (in our case defining how far and in which direction the target should be pushed at each optimization phase) is not trivial. In fact, some locations can lead to a hard optimization landscape (e.g. with a danger to get trapped in local minima or to diverge) while some other locations by luck can make the optimization easy. And these "easy" targets would change for every random initialization.
To tackle these challenges, we propose to rely on intrinsically-motivated goal exploration processes (IMGEPs), an algorithmic process which was shown successful at generating a learning curriculum for complex exploration spaces in robotics
IMGEP Step
Hover over gray areas to show the details of the step.
An IMGEP is an algorithmic process that allows to sample new goals and try to achieve them. The IMGEP process reuses knowledge from the previous
trials. In our case, we use IMGEP as it allows to automatically build a curriculum by randomly trying new goals.
In our case, the goal space is simply a 2-dimensional vector space representing the position of the center of mass of the creature. Hence, a policy in Lenia (controlling the CA initialization and rules) achieves a target goal when it produces a creature whose position at the last timestep (here t=50) is within an accepted range from the target one. While there exist many goal-sampling strategies in the IMGEP literature, we use here a simple version that randomly samples positions in the grid but that biases the sampling both toward one edge of the grid in order to obtain moving creatures and taking care that the sampled goals are not too far from already-attained positions. To attain a new target position/goal, the goal-achievement policy relies on (i) the history of previously-tried policies to select the parameters that performed best (achieved the closest position); and (ii) an inner loop that uses gradient descent with MSE error between the selected policy's last state and the target shape centered at the target goal. Therefore, there are two loops, one outer setting the goals and one inner that applies several steps of gradient descent toward this goal. The overall method can be summarized as such:
Perform random policies in Lenia saving the obtained (parameters,reached goal) tuples in history \(\mathcal{H} = (p_i,rg_i)_{i=1,..s}\)
Loop (number of IMGEP step)
Sample target position/goal (not too far from reached positions in the history \(\mathcal{H} \) )
Select, from the history , the parameters that achieved the closest position/goal
Initialize the system with those parameters
Loop (number of optimisation steps)
Run lenia
Gradient descent toward the target shape at target position to optimize the parameters
Initialize the system with those optimized parameters
Run lenia one more time to see what is the position (i.e. goal) achieved
If the creature died or exploded, don't save
Else, add to history the parameters resulting from optimization and the outcome/goal reached \(\mathcal{H} =\mathcal{H} \cup (p^\star,rg )\)
An advantage of IMGEPs is that the information collected when a policy "fails", e.g. reaching a position far from the selected target, can be useful later on for reaching other positions: it might still make a small improvement or it might go in a completely different area which we might want to explore. The fact that we don't always select the last checkpoint as in classic curriculum learning also allows us to have different "lineages" which may help to avoid being stuck in local minima or in an optimization area where the optimization can easily diverge.
Moving creatures obtained
Robustness of learned moving creatures over longer time spans than in the training process.
Each square is the result of 1 trial of the method. The video on the left displays successes of the method
where the creatures obtained are long term stable. The video on the right displays trials where the creatures obtained die few steps after the
number of timesteps it has been trained on. The last creature on the right
tries as much as possible to fit the target at last seen timestep resulting in a death right after.
The method proposed above gives us a set of rules and an initialization in Lenia that lead to the emergence of a moving creature. The obtained rules and initialization are different every time we run the search as the method's initialization (first line in the pseudocode) is random. This results in different creatures emerging with every set of obtained rules (different seeds of the method). Interestingly, some of the emerged creatures are long term stable (their shape is kept stable) while others may become unstable after a few timesteps. In fact as the creature is only trained for 50 timesteps, when running for longer, the creature can have unpredictable behaviors. Seeing that the majority of creatures that emerge from training are long term stable (8 over 10 trials with initialization selection, see appendix for more info on initialization selection), whereas it is not specified/penalized in the training loss, is a first hint of the generalization capabilities of self-organizing agents.
Results of successive outer (top) and inner (bottom) optimization steps in practice in 2 different runs
This figure shows how the bi-level optimization (an inner loop inside of an outer loop) progressively evolves moving creatures. For each run, the upper-left video corresponds to the creature obtained after initialization of the parameters (before optimization), which is not able to move at all. After several outer steps (shown at the top and separated by blue lines), we can see how the evolved creatures improve their behavior, with the upper-right one (at the end of the bi-level optimization) reaching quickly the edge of the grid. At the bottom, a "zoom" (represented by the red lines)
on the first outer step (between first and second top videos)
shows in more details the successive small improvements made by the inner loop
(gradient steps) during first outer step. Each inner step slightly improves the creature by making it go a little further. At the end of
these inner steps, we get the result of the first outer step (second top video).
Can we learn robust creatures with sensorimotor capabilities ?
When we talk about sensorimotor capability, we expect agents that are able to robustly perform goals (such as moving toward the opposite edge of the CA grid) under a variety of environmental conditions, involving the processes of sensing the environment and acting upon it.
In the previous section, we have shown how to learn rules in Lenia leading to the emergence of agents with moving capacity. However this was done in a neutral environment, where agents did not have to cope with any external perturbations. When we talk about sensorimotor capability, we expect agents that are able to robustly perform goals (such as moving toward the opposite edge of the CA grid) under a variety of environmental conditions, involving the processes of sensing the environment and acting upon it. To find sensorimotor capable agents, we want to train them on a variety of tasks that are not only specified by a goal (2D position on the CA grid) but also by an environmental configuration (everything that is "outside" of the agent and that challenges goal achievement). The effectiveness of training agents on a curriculum of tasks (curriculum of goals but also curriculum of environmental configurations per goal) has been shown to foster the emergence of generally capable agents
In this work, we focus on modeling obstacles in the environment physics and propose to probe the agent sensorimotor capability as its performance to move forward under a variety of obstacle configurations. This section explains how we model the agent-obstacle interactions in Lenia and how our training method integrates the generation of a curriculum of stochastic goals and obstacle configurations, leading to the emergence of sensorimotor capable creatures.
Modeling agent-environment interactions in Lenia, the example of obstacles.
$$ A^{t+1}= \left[A^t +\frac{1}{\textcolor{#00c8c8}{T}} \left(G_{wall}(K_{wall}*A^t_1) +\sum_k \textcolor{#008000}{h^k} G_k(K_k * A^t_0) \right) \right]^1_0 $$
$$ K_k= x \rightarrow \left( \sum_i^{n} \textcolor{#0000c8}{b_i^k} exp(-\frac{(\frac{x}{\textcolor{#0000c8}{r^k} \textcolor{#baba40}{R}}-\textcolor{#0000c8}{rk^k_i})^2}{2(\textcolor{#0000c8}{w^k_{i}})^2}) \right) sigmoid(-10(\frac{x}{\textcolor{#0000c8}{r^k} \textcolor{#baba40}{R}}-1)) $$
$$ K_{wall}= x \rightarrow exp(-\frac{(\frac{x}{2})^2}{2}) sigmoid(-10(\frac{x}{2}-1)) $$
$$ G_k= x \rightarrow 2*exp(-\frac{(x-\textcolor{#c80000}{m^k})^2}{2(\textcolor{#c80000}{s^k} )^2})-1 $$
$$ G_{wall}= x \rightarrow -10 max(0,(x-0.001)) $$
The sigmoid term is only a smooth and differentiable version of \( \mathcal{1}_{x\leq r^k R } \)
Parameters:
R maximum radius of a kernel
T time scale
For each kernel:
- w \( \in [0,1]^n \) width of the guassian bumps
- b \( \in [0,1]^n \) height of the guassian bumps
- rk \( \in [0,1]^n \) shift of gaussian bumps from center of kernel
- r \( \in [0,1] \) relative radius
- m mean in growth function
- s variance/size in growth function
- h \( \in [0,1] \) weight of the kernel
Formulas
Update step in our Lenia System with obstacle channel
The obstacle channel is another parallel grid which allows to put obstacles in the environment by putting
some of the pixels of this grid to 1. These obstacles will have a direct impact (through a fixed local rule) on the learnable channel
as it will prevent any growth of the creature where obstacles are present.
You can hover on the Formulas button to see the corresponding Lenia's equations.
The multi-channel aspect of Lenia allows the implementation of different types of cells/particles. To implement obstacles in Lenia we added a separate "obstacle" channel with a kernel going from this channel to the learnable "creature" channel. This kernel triggers a severe negative growth in the pixels of the learnable channel where there are obstacles but has no impact on other pixels where there are no obstacles (very localized kernel). This way we prevent any growth in the pixels of the learnable channel where there are obstacles.
a ) Orbium (glider like in Lenia obtained by hand made mutation) dies from perturbation by obstacles
b) In the creatures found by handmade random exploration, some die from perturbation by the obstacle while some by luck are able to resist these perturbations.
c) Creatures obtained in the previous section (which were trained to move forward without external perturbations) die from collision with obstacles.
d) Some creatures obtained in the previous section resist some collisions (left) but often die from other collisions (right). The creature is the same in both videos.
The learnable channel cells can only sense the obstacles through the changes/deformations it implies on it or its neighbours. In fact, as the only kernel that goes from the obstacle channel to the learnable channel is localized, if a macro agent emerges it has to "touch" the obstacle to sense it. To be precise the agent can only sense an obstacle because its interaction with the obstacle will perturb its own configuration and dynamics (i.e. its shape and the interaction between the cells constituting it). This is similar to experiments with swarming bacteria
Additionally, we impose the obstacles to stay still, meaning that there is no rule that goes toward (and hence no update of) the obstacle channel . As such, an update step in the final system is summarized in the above figure with the channel 1 being the learnable channel while the channel 2 is the obstacle channel.
To grasp the impact of the new obstacle channel and physics, we then tested how the previously-found moving creatures react to this environment.
The creatures found by hand in Lenia are not very robust to this new environment physics. A glider type of creature that was found in 1-channel Lenia dies from most collisions with external obstacles (figure a). Another multi-channel creature (figure b left) dies from special collisions with the wall. Only one multi-kernel creature was able to sense the wall and resist perturbation, but even this required us to manually slow down Lenia's time (parameter T) so that the creature can make smaller updates. And even then, the creature movements are kind of erratic.
Similarly, the moving creatures obtained with gradient descent and curriculum in the previous section do not display much robustness to collision with obstacles (figure c): only few by luck already have some level of robustness (figure d). This motivated the need for training methods which, given this environmental physics in the CA paradigm, are able to learn the parameters leading to the emergence of agency and sensorimotor capabilities with better resilience to perturbations.
Training method with stochastic environmental perturbations
Two different configurations of obstacles during training
The inner optimization samples diverse positions of obstacles allowing generalization as it induces, during training,
different perturbations on the agents.
For instance this figure displays 2 examples of sampled configurations (blue circles positions) and we can see that
the perturbations on the creature structure/morphology are totally different in left and right figure.
To learn the rules leading to the emergence of a creature that would resist and avoid various obstacles in its environment, we simply introduce (randomly generated) obstacle configurations within the training process, as shown in red in the training pseudocode. This way, the inner loop (goal-directed gradient descent) becomes stochastic gradient descent with the stochasticity coming from the sampling of the obstacles. The learning process will thus encounter a lot of different obstacle configurations and may find a general behavior. In practice, we only put obstacles in half the lattice grid. This way, as shown in the above figure, the first half of the grid is free from obstacles which allows to first learn a creature that is able to move without any perturbation, as it was done in the previous section. Then, as we push the target further and further, the creature starts to encounter obstacles. And the deeper the target position is, the more it encounters obstacles and so the more robust it should be. The curriculum is made by going further and further because the further you go the more you will have to resist obstacles. In the IMGEP, at the end of each goal-directed inner optimization, the goal achievement is measured as the distance between the target position and the average position attained on different other random configurations of obstacles.
Perform random policies in Lenia saving the obtained (parameters, reached goal) tuples in history \(\mathcal{H} = (p_i,rg_i)_{i=1,..s}\)
Loop (number of IMGEP step)
Sample target position/goal(not too far from reached positions in the history \(\mathcal{H} \) )
Select, from the history , the parameters that achieved the closest position/goal
Initialize the system with those parameters
Loop (number of optimisation steps)
Sample random obstacles
Run lenia
Gradient descent toward the target shape at target position to optimize the parameters
Initialize the system with those optimized parameters
See what is the mean position(ie goal) achieved
over several random obstacles runs
Loop (number of random run)
Sample random obstacles
Run lenia
Add reached goal to the mean
If the creature died or exploded during one of the tests, don't save
Else, add to history the parameters resulting from optimization and the mean outcome/goal
reached \(\mathcal{H} =\mathcal{H} \cup (p^\star,rg )\)
The success of the method to produce rules that lead to the emergence of sensorimotor agents highly depends on the IMGEP initialization (first line of the pseudocode). We refer to the overcoming "bad initialization" section of the appendix for more information on how we solve this problem. Additional experimental details are also provided in the appendix.
Robust moving creatures obtained
From the method, we obtain a wide variety of creatures that seem to easily travel through random configurations of obstacles like the ones they saw during training. Over 10 trials of the method with initialization selection (see appendix for more info on initialization selection), 7 succeeded leading to such creatures. You can find the failure cases in the failures cases section of the appendix.
The obtained emerging creatures are robust to perturbation by obstacles
The creatures were obtained by adding obstacle perturbations during training.
This robustness was not observed when the creature was trained to go forward without any perturbation as shown in figure c at the
beginning of the section.
The obstacle configurations are totally random and have not been seen during training.
In the 3 first videos, yellow corresponds to the learnable channel (where we want the creature to emerge), and blue to obstacles in the system.
In the last video, the learnable channel has a custom colormap allowing to see
more easily the wide range of continuous states in the creature while the obstacles are
in black.
How well do the creatures obtained generalize ?
In this section, we investigate the generalization of the discovered sensorimotor agents to several out-of-distribution perturbations that were not encountered during training. While the creatures were trained on a diversity of obstacle positions (but with fixed number, shape and size), we can imagine a much larger and challenging set of evaluation tasks to assess the agent's general sensorimotor capabilities. Below we show a sample of such possible test tasks, while many more could be envisaged. To better display the versatility of some of the creatures obtained, we keep the same creature for all the demo below except stated otherwise. In the first part of this section, we look at the capabilities of the obtained creatures "alone", without paying attention to creature-creature interactions. In the second part, we test the capabilities of the obtained creatures to interact with each other in the multi creature setting.
Single creature setting
ARE THE CREATURE LONG TERM STABLE ?
Even if we can not know if the creature is indefinitely stable, we can test for a reasonable number of timesteps. The result is that the successful creatures that were obtained with IMGEP with obstacles seem stable for 2000 timesteps while they have only been trained to be stable for 50 timesteps. This might be because as it learned to be robust to deformation it has learned a strong preservation of the structure to prevent any explosion or dying when perturbed a little bit. And so when there is no perturbation this layer of "security" strongly preserves the structure. However, training a creature only for movement (without obstacles so no perturbation during training) sometimes led to non long term stable creatures. This is similar to what has been observed in
ARE THE CREATURES ROBUST TO NEW OBSTACLES ?
The resulting creatures are very robust to wall perturbations and able to navigate in difficult environments with various unseen configurations of obstacles including vertical walls (see interactive demo). One very surprising emerging behavior is that the creature is sometimes able to come out of dead ends showing how well this technique generalizes. There are still some failure cases, with creatures obtained that can get unstable after some perturbations, but the creatures are most of the time robust to a lot of different obstacles. The generalization is due to the large diversity of obstacles encountered by the creature during the learning. Moreover as it learns to go further, the creature has to learn to collide with several obstacles one after the other and so be able to recover fast but also still be able to resist/sense a second obstacle while not having fully recovered.
ARE THE CREATURES ROBUST TO MOVING OBSTACLES ?
We can make a harder out of distribution environment by adding movement to the obstacles. For example we can do a bullet like environment where the tiny wall disks are shifted by a few pixels at every step. The creature seems quite resilient to this kind of perturbation even if we can see that a well placed perturbation can kill the creature. However, this kind of environment differs a lot from what the creature has been trained on and therefore shows how much the creature learned to quickly recover from perturbations, even unseen ones.
ARE THE CREATURE ROBUST TO ASYNCHRONOUS UPDATE ?
As done in
ARE THE CREATURE ROBUST TO CHANGE OF SCALE ?
The grid is the same size as above giving an idea of the scale change (kernel radius*0.4)
We can change the scale of the creature by changing the radius of the kernels as well as the size of the initialization square (with an approximate resize). This is a surprising generalization as it completely changes the number of cells constituting the macro entity. We can make much smaller creatures that therefore have less pixels to do the computation. This scale reduction has a limit but we can get pretty small creatures. The creatures still seem to be quite robust and be able to sense and react to their environment while having less space to compute. We can also do it the other way around, and have much bigger creatures that therefore have more space to compute (but also more cells to organize). Interestingly, evidence of multi-scale adaptivity is also something that can be observed in biological organisms
ARE THE CREATURES ROBUST TO CHANGE OF INITIALIZATION ?
While the creature initialization has been learned with a lot of degree of liberty, we can look if the same creature can emerge from other (maybe simpler) initialization. This capacity to converge to the desired shape in spite of a different initialization can be found in nature, as shown in this work from Vandenberg et al.
Do the creatures react to body damages?
In order to navigate, the creature first needs to sense the wall through a deformation of the macro structure. Then after this deformation it has to make a collective "decision" on where to grow next and then move and regrow its shape. We can even do the deformation ourselves by suppressing a part of the creature, the result is that the creature is effectively changing direction as if an obstacle was present. This confirms that the perturbation of the macro structure is what leads to the direct change of direction. It's not clear looking at the kernels activity which ones (of the kernels) are responsible for these decisions if not all. How the decision is made remains a mystery. Moreover some cells don't even see the deformation because they're too far away, meaning that some messages from the one sensing it have to be transmitted.
Multi creature setting
By adding more initialization squares in the grid, we can add several macro creatures constituted by identical cells (with the same update rule) letting us observe multiple creatures . As pointed out by R.D. Beer
INDIVIDUALITY
For this creature displayed here, we had to tune by hand the kernel after the training in order to get individuality. But we sometimes obtain individuality directly from the training.
Some creatures obtained show strong individuality preservation. In fact, creatures go in non destructive interactions most of the time without merging, despite the fact that they're all made from identical cells. If individuality isn't obtained during training, we can tune the weight of the growth (especially the limiting growth one) to make the merge of two creatures harder. By increasing those limiting growth kernels, the repeal of two entities gets stronger and they will simply change direction. Individuality has also been observed in the "orbium" creature found by hand in Lenia for example, but it was much more fragile with a lot of collisions that led to destruction or explosion. It's interesting to notice that individuality can be obtained as a byproduct of training an agent alone. In fact our intuition is that by trying to prevent too much growth, it learned to prevent any living cell that would make it "too big", including in the multi creature case living cells from other creatures. Over the 10 random trials, 4 of them led to the emergence of creatures with strong individuality preservation.
ATTRACTION
If they are too far from each other no attraction.
One other type of interaction between two creatures of the same species (governed by the same update rule/physic) is creatures getting stuck together. The two creatures (here it's a different creature than the one shown above) seem to attract each other a little bit when they are close enough, leading to the two creatures stuck together going in the same direction. When they encounter an obstacle and separate briefly, their attraction reassembles them together. Even when they're stuck together, from a human point of view seeing this system, we can still see 2 distinct creatures. This type of behavior is studied in the game of life in
REPRODUCTION
Another interesting interaction we observed during collision was "reproduction". In fact, for some collision, we could observe the birth of a 3rd entity. This kind of interaction seemed to happen when one of the two entities colliding was in a certain "mode" like when it just hit a wall. Our intuition is that when it hits a wall, it has to have a growth response in order to recover. And during this growth response if we add some perturbation of another entity it might separate this growth from the entity and then this separated mass from strong self-organization grows into a complete individual.
THEORETICAL FOUNDATIONS OF COGNITION
Cellular automata have been used as a testbed/showcase for theories on cognition, identity and life (like what are the necessary parts needed for "life" ?). The game of life was particularly studied
On the other hand, Information theory can be used to define more clearly concepts such as individuality, agency and cognition by giving a measure of such concepts
NEURAL CA
Neural cellular automata (NCA) use the flexibility and differentiability of neural networks to express and learn the update rule for a variety of tasks
SWARM ROBOTICS
We can draw parallels with swarm robotics which dictates how several agents should sense and communicate locally in order to arrange themselves in the group and in their environment
VOXEL BASED SOFT ROBOTS
Voxel based soft robots are composed of several tiny blocks/entities glued together that can contract (actuator). The contractions are either automatic contractions
Works on soft robotics have focused on designing the morphology of soft robots using cellular automata as builders of the morphology, as well as responsible for regeneration
CO-EVOLVING THE MORPHOLOGY AND THE CONTROLLER
Evolving the morphology to find good (and diverse) morphology for a task (with the associated global controller) has been studied in
Pathak et al
OPEN ENDED EXPLORATION
The application of intrinsically-motivated goal exploration processes for the automated discovery of self-organizing patterns in Lenia follows population-based IMGEP systems in
Discussion
The computation of decision is done at the macro (group) level, showing how a group of simple identical entities can make "decision" and "sense" at the macro scale through local interactions only, and without a clear notion of body/sensor/actuator.
In closing this blogpost, let us reiterate that what is interesting in such a system is that the computation of decision is done at the macro (group) level, showing how a group of simple identical entities can make "decision" and "sense" at the macro scale through local interactions only, and without a clear notion of body/sensor/actuator. Seeing the discovered creatures, it's even hard to believe that they are in fact made of tiny parts all behaving under the same rules. While some basic sensorimotor capabilities (spatially localized and moving entities) has already been found in Lenia with random search and basic evolutionary algorithms, this work makes a step forward showing how Lenia's low-level rules can self-organize robust sensorimotor agents with strong adaptivity and generalization to out-of-distribution perturbations.
Moreover, this work provides a more systematic method based on gradient descent, diversity search and curriculum-driven exploration to easily learn the update rule and initialization state, from scratch in high dimensional parameters space, leading to the emergence of different robust creatures with sensorimotor capabilities. We believe that the set of tools presented here can be useful in general to discover parameters that lead to complex self-organized behaviors.
Yet, most of the analyses we make in this work are subjective. Future work might want to have a better definition of agency and sensorimotor capabilities by defining a measure of such behavior.
Also, engineering subparts of the environmental dynamics with functional constraints (through predefined channels and kernels) has been crucial in this work to shape the search process
In fact, beyond individual capabilities, we could even wonder under what conditions one could observe the emergence
of an open-ended evolutionary process
In fact, beyond individual capabilities, we could even wonder under what conditions one could observe the emergence of an open-ended evolutionary process
Despite those fundamental scientific questions, future work might also consider the biological implications and applications of this work. Inferring low-level rules to control complex system-level behaviors is a key problem in regenerative medicine and synthetic bioengineering