Self-Evolving Cellular Automata — Kai Saksela

2 min read Original article ↗

We start with a grid of cells that update in discrete steps.

All cells are computed in parallel, but we'll look at the progression from the perspective of a single focal cell (index 1). Each cell interacts with its 8 immediate neighbors, forming a group of 9, indexed as in the image to the left.

At each step tt the focal cell has a non-negative energy scalar E1(t)R+E_1^{(t)}\in\mathbb{R}_+, a transfer matrix W1(t)R9×9\mathbf{W}_1^{(t)}\in\mathbb{R}^{9\times 9}, and a bias vector b1(t)R9\mathbf{b}_1^{(t)}\in\mathbb{R}^{9}. Both W\mathbf{W} and b\mathbf{b} flow with energy and get updated by incoming energy.

To compute how energy spreads, stack the 9 neighboring energies into e(t)\mathbf{e}^{(t)}. We use an affine transformation followed by a nonlinearity to produce a non-negative spread:

z1(t)=W1(t)e(t)+b1(t),s1(t)=f ⁣(z1(t))+ϵ.\begin{aligned} \mathbf{z}_1^{(t)} &= \mathbf{W}_1^{(t)}\,\mathbf{e}^{(t)} + \mathbf{b}_1^{(t)}, \\ \mathbf{s}_1^{(t)} &= f\!\left(\mathbf{z}_1^{(t)}\right) + \epsilon. \end{aligned}

We use rectified linearity, f(x)=max(0,x)f(x)=\max(0,x), so the spread is non-negative without constraining W\mathbf{W}. The small ϵ\epsilon prevents the all-zero case.

p(t)=s1(t)j=19sj(t),e1(t+1)=E1(t)p(t).\mathbf{p}^{(t)} = \frac{\mathbf{s}_1^{(t)}}{\sum_{j=1}^{9} s_j^{(t)}},\qquad \mathbf{e}_1^{(t+1)} = E_1^{(t)}\,\mathbf{p}^{(t)}.

This is done in parallel for all cells, accounting for walls or blocked cells, and summed to get the energy grid at the next step.

The mechanism moves with the energy: the transfer matrix update is proportional to incoming energy flow. Let ej1(t)\mathbf{e}_{j\to 1}^{(t)} denote the energy that arrives to the focal cell from neighbor jj. We form a weighted average of the neighboring matrices:

W~1(t+1)=j=19(ej1(t))2Wj(t)j=19(ej1(t))2+ϵ\tilde{\mathbf{W}}_1^{(t+1)} = \frac{\sum_{j=1}^{9} \left(\mathbf{e}_{j\to 1}^{(t)}\right)^2\,\mathbf{W}_j^{(t)}}{\sum_{j=1}^{9} \left(\mathbf{e}_{j\to 1}^{(t)}\right)^2 + \epsilon}

We square the energy when weighting, which emphasizes directions with higher energy flow. The bias vector b\mathbf{b} is updated in the same way. The same ϵ\epsilon keeps the denominator non-zero.

Finally, we inject randomness when energy is low. The idea is that empty or low-energy cells should mutate their mechanisms, while high-energy cells preserve them:

W1(t+1)=λ1(t+1)W~1(t+1)+βξ(1λ1(t+1)),ξN(0,1)\mathbf{W}_1^{(t+1)} = \lambda_1^{(t+1)}\,\tilde{\mathbf{W}}_1^{(t+1)} + \beta\,\boldsymbol{\xi}\,\left(1-\lambda_1^{(t+1)}\right),\qquad \boldsymbol{\xi}\sim\mathcal{N}(0,1)

λ1(t+1)=E1(t+1)β+E1(t+1)\lambda_1^{(t+1)} = \frac{E_1^{(t+1)}}{\beta + E_1^{(t+1)}}

Here β\beta sets the overall noise scale. When E1E_1 is small, λ\lambda is near zero and randomness dominates; as energy grows, the incoming mechanisms take over. The same blend is applied to b\mathbf{b}.

To initialize the system, we sample W\mathbf{W} and b\mathbf{b} at random and place a chunk of energy in the middle. In the live simulation above, randomness is injected in a controllable region, and mechanisms spread and compete as energy flows.