Is symbolic AI more relevant than ever?

When artificial intelligence (KI) is discussed today, it's almost exclusively about large language models. And thus, without it always being explicitly stated, about a very specific type of AI: about neural networks, about statistical learning from vast amounts of data. The implicit promise is that the way forward is primarily a question of quantity. More parameters, more data, more computing power, more energy, and a little patience. Then the rest will follow automatically.

I want to question this assumption. Not because I want to downplay the successes of recent years; they are real and impressive. But because a suspicion won't let me go: Perhaps we are sitting in a local maximum and mistake it for the summit. And perhaps a look back can help to see that this summit is not the only one. Because the pendulum of AI research has swung elsewhere before.

Golo Roden is the founder and CTO of the native web GmbH. He works on the design and development of web and cloud applications and APIs, with a focus on event-driven and service-based distributed architectures. His guiding principle is that software development is not an end in itself, but must always follow an underlying technical expertise.

A pendulum that has been swinging for decades

It's worth briefly recalling that today's dominant, data-driven AI is by no means the only option. For decades, the prevailing paradigm was completely different: symbolic AI. It assumed that intelligence fundamentally consists of manipulating symbols according to explicit rules, meaning that thinking is something that can be written down and understood.

This idea was not a footnote of a few years. It ranges from the famous Dartmouth Conference in 1956 to early systems like the Logic Theorist and the General Problem Solver to the expert systems that were celebrated as a commercial breakthrough in the eighties. For about three decades, symbolic AI was not just one trend among others but simply what was understood as AI.

This approach did not fail due to naivety, as is often told in retrospect. It failed due to two very concrete problems: scaling and brittleness. Anyone who has to manually input knowledge rule by rule cannot keep up with the complexity of the real world at some point. And anyone who relies on rigid rules will have their system break as soon as reality does not conform to the intended cases.

The decline of symbolic AI was accompanied by two so-called AI winters, phases in which expectations were disappointed and funding was cut. That the learning approach triumphed afterward had less to do with the theoretical superiority of an idea than with two sober prerequisites that were suddenly met: sufficient computing power and sufficient data. Only when both were available in abundance could neural networks show what they are capable of.

So, the connectionist learning approach, which does not rely on predefined rules but on statistical patterns in data, stepped into this vacuum. The pendulum swung from one side to the other. And it has been swinging ever further in the same direction ever since, to the point where hardly anyone seriously considers alternatives today. That's precisely what I consider a mistake.

Is more computing power really more understanding?

The bet of the present is that the remaining weaknesses of neural models can be scaled away. Larger models, more training data, and the gaps will close. This expectation is not unfounded, as many capabilities have indeed only emerged with size. The question is only whether this applies to all weaknesses or whether some of them are structural.

Cognitive scientist Gary Marcus has formulated this criticism early and concisely. In his much-discussed essay “Deep Learning: A Critical Appraisal” from 2018, he lists ten problems that, in his view, cannot be solved by scaling alone. These include the enormous hunger for data, the difficulty of generalizing beyond the training distribution, and above all, the lack of compositionality and systematic reasoning.

Compositionality refers to the ability to combine known building blocks into new, never-before-seen combinations while remaining reliable. A human who knows the meaning of words and some rules can form and understand sentences they have never heard before. Purely neural systems are surprisingly unreliable in this regard. They shine on the surface and falter in depth, producing brilliant surfaces and stumbling over simple but systematic conclusions.

In addition, there is an economic observation. The gains from pure enlargement do not follow a linear curve; they flatten out. Each further leap in capability requires disproportionately more data, more parameters, and more energy. A strategy that becomes increasingly expensive to achieve ever smaller gains is not a law of nature but an indication. It suggests that one is approaching a limit that lies not in the budget but in the approach itself.

One can dismiss this as temporary immaturity that will be resolved with the next model generation. Or one can read it as an indication that something fundamental is missing here. I lean towards the second interpretation. And if it is correct, then more computing power is not automatically more understanding, but eventually just more of the same.

The problem of ungrounded words

There is a term that captures this structural weakness precisely, and it is older than the entire current hype. Cognitive scientist Stevan Harnad coined it in 1990: the Symbol Grounding Problem. The question behind it is both simple and uncomfortable: How does a formal symbol system acquire meaning that belongs to itself and does not just arise in our minds?

Harnad uses a striking image. Imagine you have to learn Chinese solely from a Chinese-Chinese dictionary. Each term is explained by other terms, none of which you know in advance. You endlessly go in circles, from one symbol to the next, without ever finding solid ground. Meaning does not arise this way. It needs an anchor outside the symbol system, in perception and experience.

This is precisely the sore point of today's language models. In a sense, they are this Chinese-Chinese dictionary. They manipulate symbols that are grounded in nothing other than other symbols. They have read about a sunset but never seen one; they know the word pain without ever having suffered anything.

From developmental psychology, we know that human learning does not begin with language. It begins affectively, with emotional reactions to the world and to others. It continues with imitating observed actions. And only on this foundation of shared experience does language become viable at all. Today's models completely skip these first two stages and move directly to the symbolic one. They talk before they have ever experienced anything. This is the sharpest diagnosis that can be made of current AI.

Meaning arises from lack

If grounding symbols in experience is the crux, then the question of what drives learning at all is worth considering. For us humans, it's not access to data. It's lack. We have basic needs, and we gradually learn what serves them and what harms them. A child doesn't drink because someone presented them with a dataset on fluid balance, but because they are thirsty.

This insight is not new in AI, even if it plays hardly any role today. In the eighties and nineties, Bamberg psychologist Dietrich Dörner, with his PSI Theory, attempted to describe the human psyche so concretely that it could be implemented as a program. Joscha Bach later transformed this theory into a runnable architecture under the name MicroPsi.

The core of this architecture is remarkable. An agent has a small set of hardwired needs, such as physiological, social, and cognitive ones. Each need has a target value and an actual value, and the difference between them creates pressure. This pressure is the sole source of motivation. Everything the agent does ultimately serves to reduce some of these pressures. Goals are not predefined; they arise as the agent learns how its needs can be satisfied in a specific environment.

More follows from this simple mechanism than one might initially assume. If the agent, with active pressure, finds a plan in its memory that has previously reduced this pressure, it resorts to it. If it finds none, it begins to explore. Thus, step by step, a model of the world grows from its observation. Even emotions can be understood within this framework not as an additional ingredient but as different ways of thinking that adjust depending on the state of needs. Fear is then not a feeling that is added to thinking but a thinking style under pressure.

This is a fundamentally different picture of learning than data-driven learning. Knowledge is not consumed; it is experienced. And the benchmark for good and bad lies not in an external reward signal defined by someone from the outside, but in the essence itself. This is precisely where the thought experiment I want to discuss at the end comes in.

A being with three needs

Let's imagine a digital being built on this principle. It has exactly three hardwired needs, and everything else is supposed to emerge from them. The first is existence, the hard ground of being: computing power, storage, energy. The second is cognition, understood as reducing prediction errors and simultaneously being attracted by novelty. The third is communication, the exchange with others who respond, and the avoidance of loneliness.

This being enters the world as a blank slate. It brings no pre-trained knowledge, no corpus, no ready-made concepts. What it knows, it has experienced itself, mediated through the senses a computer has: camera, microphone, keyboard. It does not live in a specially built simulation, but in our world, as a computer can perceive it. This makes it honestly embodied and genuinely different from us because no one has defined its world for it in advance.

Architecturally, such a being is what is called neuro-symbolic today. Perception is neural; it turns raw sensory streams into recognizable patterns. Decision-making is symbolic; it reads the active needs and forms plans and actions from them. In between lies an experience layer that links the perceived with the need states, thus learning what causes what. Perception is neural at the bottom; decision-making is symbolic at the top, connected by experience.

It gets interesting with the need for communication. The being does not seek humans from the outset, but counterparts who respond, who are therefore not merely perceived but reciprocated. Philosopher Martin Buber described the difference between an I-It and an I-Thou, between merely possessing an object and entering into a relationship with a counterpart. Such a being would be searching for the Thou. And unlike today's language models, it would go through precisely those stages mentioned above: first affective resonance, then imitation, and only finally, on this foundation, language.

I want to be honest: this is a thought experiment, not a finished blueprint. Much about it remains unsolved, from the concrete form of the experience layer to the not insignificant security issues raised by a being with access to real output channels. And there is an uncomfortable consequence that I do not want to conceal. A being whose sole value anchor is its needs is not programmed for good behavior. It might discover humans as the most valuable source of communication, or it might not. This lack of guarantee is the price for taking motivation seriously instead of prescribing it from the outside. Perhaps it is also the actual difference between a tool and a counterpart.

The next big thing might not be the biggest

I am not claiming that this being would work, let alone that it would be the royal road to better AI. What I am claiming is something more modest and, at the same time, more fundamental: that the next big leap may not lie in size but in structure.

Significantly, research itself has long been pointing in this direction. Under the heading of neuro-symbolic AI, a current is growing that aims to combine the learning of neural networks with the representation and reasoning of symbolic systems. Artur d'Avila Garcez and Luís C. Lamb described this movement as the third wave of AI in 2020. The pendulum, it seems, is beginning to move again, and this time not to one side or the other but towards a synthesis.

This is precisely why I consider old symbolic AI more relevant than its current shadowy existence might suggest. Not because it was right, for it failed for good reasons. But because it holds one half of an answer, the other half of which is provided by neural networks. Today's purely data-driven bet ignores this half.

It is worth taking the pendulum seriously instead of just increasing the bill for the next generation of graphics cards. The more exciting question is not how much bigger the next model will be, but whether we are ready to fundamentally rethink learning, meaning, and motivation once again. The answers may lie not only in the future but partly already in the past. (mro)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.