Human Intelligence, Computation, and the Universal Machine

The Fluid Universality of Math and Computation

Press enter or click to view image in full size

Most people are uncomfortable with the notion that human cognition is nothing more than computation. After all, humans are currently far superior to machines at most tasks. It seems too limiting to frame the abilities of the human mind in terms of some symbol-churning machine that dryly converts inputs into outputs.

But the arguments around why this is seem to always fall flat. What is it about human thinking that is so different? If human intelligence is something more than computation where does this more come from?

I argue that people’s aversion to the machine analogy stems from their misunderstanding of computation itself. When computing is viewed as some rigid, mechanical construct then the contrasts between human and machine are glaring. But computation is anything but fixed and rigid. It is capable of far more expressive, even creative power than most people realize.

Understanding how computing is capable of more than programmatic outputs requires we look into how machines make decisions, and the concept of universality. Even more critical is the appreciation and comprehension of what it is Gödel and Turing discovered.

Only by understanding how fluid and creative both mathematics and computation are can one appreciate the computing analogy fully, and rightfully start their arguments from this grounding. One must acknowledge what computation is, at its core, before taking either side of the debate.

This article looks to help provide that foundation.

I will be using the term computing “analogy” (with quotes) throughout this piece, but not as a declaration that human cognition is different than computing (i.e. only an analogy). One must not bake a conclusion into their premise, otherwise their writing is circular. Thus I use computing “analogy” only to enable a comparison between human and machine thinking.

FRAMING THE PROBLEM

Trishank Kuppusamy recently wrote 2 articles³ ⁴ arguing that people cannot be smarter than others based on computational universality, and invited me to weigh-in on this idea. The comparative intelligence approach, it turns out, is a good way to juxtapose humans to machines. Thus rather than commenting on points raised throughout Trishank’s essays I will use this opportunity to frame our problem of comparing human cognition to computation.

OVERVIEW

SMARTNESS
MACHINE THINKING
FROM SYMBOL SHUFFLING TO UNIVERSALITY
THE CREATIVITY OF MATH AND COMPUTATION
ON THE VALIDITY OF THE COMPUTING ANALOGY
SPEED AS FAMILIARITY
THE GENIUS MYTH

Let’s begin.

SMARTNESS

The idea that some people are “smarter” than others seems apparent to many. There are those who appear capable of arriving at answers quicker, and/or producing better answers.

But the notion someone is “smarter” than someone else is suspect for a number of fundamental reasons. To begin, intelligence can be described as relating to logic, understanding, awareness, reasoning, planning, etc. To say we can observe a single individual excelling beyond others across all cognitive categories is a dramatic oversimplification.

We also know that singular definitions of intelligence like IQ fail both statistically¹ and scientifically² despite its continued push by psychologists (and online test takers). I have yet to see an argument against the referenced essays that isn’t logically fallacious or lacking in proper understanding of statistics, probability, and science.

But there is another angle one might take in their attempt to compare people by intelligence. Perhaps comparative intelligence can be salvaged by drawing parallels between human cognition and computation?

Thinking in terms of computation is attractive. It is arguably easier to grasp computing “analogies” (remember, quotes on purpose) than trying to dig into statistical/probability errors or comprehend the fundamental differences between complex and simple phenomena. There seems to be an intuitive connection between machines and human cognition, even for those who view such similarities as forced.

When people call upon computing analogies to argue “smartness” they are talking in some sense about processing power. How well can a computing machine convert inputs into outputs, and are some people able to make this conversion better than others? One might look for mental analogs to tangible computing components such as RAM, hard drive, and CPU. Surely there are physical versions of the kinds of concepts that are known to impact computation such as interfaces, memory, throughput, parallelism, and defects.

But when you understand computation at a more fundamental level there is no need to appeal to physical differences between individuals. Computing is ultimately agnostic to its substrate, for reasons that will become apparent throughout this article. Computing is about math, symbols, and information, and the paradoxical freedom that is afforded these constructs when a certain level of complexity is reached.

If we are going to make comparisons between people’s cognitive abilities, using computing as the analogy, then we need to understand how “thinking” is modeled in computing.

MACHINE THINKING

To start, we will look at how machines solve problems, and for that, we must understand the nature of problem solving itself. This brings us to how problems are modeled in computing.

DECISION PROBLEMS

The way decision making is modeled in computing is through the use of so-called decision problems. Decision problems are problems with yes/no answers and form a major part of computational complexity theory.

While humans introduce much more subjectivity into their decision making, we need to begin our comparison around something precise. We will see later that machines also introduce “subjectivity” by leveraging past “experiences” and using approximations to arrive at their solutions.

Decisions problems are classified according to their complexity. The 2 most well-known complexity classes are P and NP. P stands for polynomial time, NP for nondeterministic polynomial time. Think of these as quickly solved and quickly verified respectively. P can be both quickly solved and quickly verified, while NP can only(?) be quickly verified (not quite right, but stay with me).

Press enter or click to view image in full size

FIGURE 2 The 2 most common complexity classes for decision problems.

Note the question marks beside “slowly solved” for the NP complexity class. We do not definitively know if NP problems can only be solved slowly, but nobody has ever found a quick and exact solution to an NP problem (for a problem of reasonable input size). This is the famous P vs NP question in computer science, which asks are there problems that are harder to compute than to verify (or can the problems whose solutions are verified quickly also be solved quickly?)?

The current consensus among complexity theorists is that P ≠ NP since no one has ever found an instance, on reasonably sized inputs, to suggest otherwise⁸.

Why care about P vs NP in this article? The complexity class a decision problems falls into tells us how efficiently an algorithm would be able to arrive at its solution. Since we’re using computing to compare people’s intelligence we need to appreciate the kind of problem being solved when people tackle a challenge.

Complexity classes do not technically demarcate problems by difficulty since the lower bounds on complexity classes are not known. For now, we will just say the complexity classes show us some problems can be solved quickly (and thus predictably) while other problems have no obvious quick solution. Figure 3 shows a qualitative comparison between the P and NP complexity classes.

Press enter or click to view image in full size

FIGURE 3 Difference between P and NP with respect to speed of solving and verifying. For P, the time to compute the answer increases as a polynomial of the input size. For NP, the time to compute the answer increases exponentially with the size of input (on deterministic Turing machines).

The plotted lines give us a sense of how dramatically different P and NP are. To fully round out this conversation we need to include 2 other complexity classes, NP-Hard and NP-Complete. Figure 4 shows our 4 complexity classes arranged with respect to increasing complexity.

Press enter or click to view image in full size

FIGURE 4 The relationship between the four major complexity classes (under the assumption P ≠ NP).

When computers are attempting to solve a problem they are trying to resolve a challenge that sits inside one of these complexity classes. Where the challenge falls obviously depends on complexity, but more specifically, how difficult the problem is deemed. Again, we can only classify the difficulty of P problems definitively, since they have demonstrable fast solutions. All other problems are conjectured to be more difficult since no quick (and exact) solution has ever been found.

It’s important to understand that we cannot label a problem by its absolute difficulty, only its relative difficulty. This is because lower bounds are exceedingly difficult to prove.

Here are the complexity class definitions:

P: can be solved in polynomial time (“solved quickly”);
NP: can be verified in polynomial time (“checked quickly”);
NP-hard: if any problem in NP can be reduced to it in polynomial time (but is not itself contained within NP);
NP-complete: if any problem in NP can be reduced to it in polynomial time AND it is in NP.

We can use the diagram in Figure 4 to reason about the relative difficulty of the challenge being tackled by a program. Solving a P problem is very different than solving an NP problem.

In fact, the type of problem one attempts to solve also changes what it means to “solve.” Saying a computer “solves” a problem doesn’t tell us much unless we agree on what the word “solve” means. It turns out, this depends on who you ask.

THE MATHEMATICIAN AND THE ENGINEER

Ask a mathematician what the word “solve” means and they’ll tell you something quite precise. A mathematical solution is an assignment of expressions to the unknown variables that makes the equality in the equation true. This kind of precision fits in with what math looks to accomplish.

But if you ask an engineer what it means to “solve” they’ll tell you something quite different. This is because an engineering solution is something that works, providing utility. It’s a good-enough solution to make progress in a given problem space.

Both of these types of solving are implemented by computers, and which one is used depends on the complexity class a problem falls into. P problems are solved using routines that produce the exact answer, whereas NP problems use approximations, knowns as heuristics, to arrive at a good-enough solution.

Press enter or click to view image in full size

FIGURE 5 The 2 ways computers approach problem solving. Problems that fall into the P complexity class are solved via routines. Problems that fall into the NP complexity class are solved using heuristics.

P complexity class problems are solved “algorithmically.” This means there is a routine comprised of well-defined steps. This is possible because P problems have a known lower bound to their computation. We can predict how long it will take to solve them. But the NP complexity class consists of problems nobody has solved exactly, using an explicit step-by-step routine, for reasonable input sizes. NP problems require we dramatically loosen our definition of “solve.”

This is our first look at how “soft” computing is when tackling truly difficult problems. People frame computation in terms of rigid rules, but this is only true for relatively simple problems with quick solutions. Whenever someone argues against humans being similar to computers, they almost always incorrectly assume that computing machines operate in a precise, rigid fashion.

Technically we can come up with a step-by-step routine to solve an NP problem but it would need to attempt all possible combinations in the search space in order to arrive at the answer. The search space for any reasonably-sized problem is astronomical. The computation would take hundreds of years, if not millennia. Attempting all possible solutions in the search space is called brute force. Brute force takes far too long for an algorithmic approach to be practical in the NP complexity class.

This is why mathematical optimization is used to arrive at solutions to NP problems, using heuristics. Optimization does not lead to a best solution (unless its lucky), only something good-enough given the available resources. This is the approach we must take to “solve” problems that are nontrivial (complex) in a reasonable amount of time. Figure 6 shows the concept of solving a problem using optimization.

Press enter or click to view image in full size

FIGURE 6 Optimization is the only way to solve nontrivial problems in a practical amount of time.

In optimization, problems are defined by an objective, rather than a routine. The possibility space is explored by setting up some goal with which to pursue. The better solutions are assessed by how close they are to realizing the objective. Better solutions are depicted in Figure 6 as a ball sitting lower on some error surface. The lower the ball, the lower the difference between our goal and the result we calculate.

This approach is at the heart of today’s artificial intelligence. It is no coincidence that AI researchers landed on the use of heuristics to solve complex challenges. Traditional, rules-based programming (routines) cannot solve problems that are nontrivial. Tasks like recognizing faces or driving cars autonomously have search spaces that are far too large for routines to solve practically.

From this point on I will refer to these 2 approaches as routine solving and heuristics solving to differentiate them. Computers must do both of these approaches if they are to solve problems that span the various levels of complexity.

There are also situations in computing where we use meta-heuristics. This is when we have incomplete or imperfect information (and/or limited computational resources). Going meta still falls under the optimization approach, where we look to “learn how to learn.”

Here are some example P and NP problems:

Example Routine Problems (P):

multiplication
sorting
string matching

Example Search Problems (NP):

factoring
scheduling
map coloring
protein folding
theorem proving
packing
Sudoku
traveling salesman

We can see that machines can tackle problems that span the spectrum of complexity, and that the type of problem tackled dictates the approach a computing machine will take. The significance of how machines tackle problems of varying complexity has major implications for our use of the computing analogy to compare people. As noted above, people are usually arguing against “machine thinking” because they assume computers are only capable of making strict, rigid decisions. But this is NOT how computation solves nontrivial problems.

People are usually arguing against “machine thinking” because they assume computers are only capable of making strict, rigid decisions. But this is NOT how computation solves nontrivial problems.

But it doesn’t stop there. To fully appreciate the power of computing, and its relevance to human cognition, we need to look at universality. It turns out the first hints of such a phenomenon becomes apparent when we take a closer look at the NP-Hard and NP-Complete complexity classes.

HINTS OF UNIVERSALITY

Press enter or click to view image in full size

A review of the major complexity classes does more than just allow us to classify problems by the approach needed to solve them. Something more fundamental arises from our understanding of decision problem complexity. At the heart of the P ≠ NP problem is the realization that lower complexity classes have some innate connection to problems of higher complexity.

The NP-complete problems represent the hardest problems in NP. What makes them “complete” is that if any of them can be solved in polynomial time then ALL problems in NP can be solved in polynomial time. In other words, if anyone ever finds a quick solution to an NP-Complete problem then all NP problems will be quickly solvable (every problem in NP can be reduced to an NP-Complete problem).

The similarities between problems tell us about the relative difficulty of problems. If we can (efficiently) reduce problem A to problem B then A shouldn’t be more difficult to solve than B. This means an NP-complete problem is universal among all NP problems.

Figure 7 shows the concept of all NP problems collapsing into NP-Complete. What I want to highlight here is the significance of this collapse with respect to universality, rather than discuss the usual implications for P vs NP (e.g. RSA encryption). It means ALL decision problems that are nontrivial share something fundamentally in common with “simpler” problems.

Press enter or click to view image in full size

FIGURE 7 All problems in NP will collapse into the NP-Complete class of problems is anyone finds a quick solution to an NP-Complete problem. This means all problems have something in common.

This commonality among problems is our first hint that computation has a property to it that is universal. Although our casual perception of everyday problems demarcates them as different in type and complexity, at their core is an undeniable sameness between all problems.

Keep in mind, the P class of problems are a subset of NP. Thus, the commonality between NP and NP-Complete extends into all P problems by default. More interesting, NP-Complete problems are the most difficult of the NP class, meaning the hardest problems in NP are universally the same as less challenging problems. Going even further, NP-Hard problems, which are at least as hard as the hardest NP problems, have all NP problems reducible to it.

Note that the shared property between problems of differing complexity does not depend on P=NP. Reducibility already proves the connection between problems across all complexity classes.

Problem difficulty has much more to do with our perception of the problem than the inherent properties of the problems themselves.

2 important things to note:

The majority of natural problems people are interested in solving belong to NP;
We can prove all NP problems are reducible to NP-Complete, without knowing anything about all the other NP problems, other than that they are in NP.

These 2 points have massive implications. Whatever your domain of interest, you are faced with NP type problems. But the perceived difference in problem difficulty across those challenges is somewhat illusory. Even though the problems don’t look anything like each other they are all, in some sense, the same problem.

When it comes to machine thinking we know computation is capable of solving problems across different levels of complexity. We know that the method used to solve different problems changes with complexity. We also know the problems themselves are much less different than we perceive them.

We already have more than enough to start drawing strong comparisons between human cognition and computing, particularly with respect to comparative intelligence between individuals. But some may still find this connection more metaphorical than literal. Recall from the introduction my use of quotations when mentioning the phrase computing “analogy.” If you truly want to use computing to reason about human cognition, whether you agree with the comparison or not, you have to appreciate the full power of true universality.

FROM SYMBOL SHUFFLING TO UNIVERSALITY

True universality in computing goes far beyond mere reduction of problems into other problems. True universality means computers have the representational power to compute anything. Specifically, this means a universal machine can calculate or simulate any kind of pattern.

How can something mechanical capture the richness and variation we see all around us? When I move my hand through water or taste the ripeness of a strawberry I see no signs of some underlying arithmetic churning out the fluid reality I experience.

But at a high-level reality is indeed some kind of computation. Inputs are constantly converted into the outputs we observe; a constant churning as resources are converted from one form into another². From my previous article on intelligence:

Press enter or click to view image in full size

But to understand how the richness we experience can come from something as seemingly mundane as raw calculation we need to let go of the notion that mathematics and computation are rigid and mechanical. We need to see how symbols can become much more than themselves.

IT ALL STARTS WITH GÖDEL

When we hear about Kurt Gödel we obviously think about his 2 incompleteness theorems. These are two theorems of mathematical logic that demonstrate the inherent limitations of every formal axiomatic system capable of modeling basic arithmetic. It says that a complete and consistent set of axioms for all mathematics is impossible.

A formal system is a system of axioms equipped with rules of inference, which allow one to generate new theorems. The set of axioms is required to be finite or at least decidable, i.e., there must be an algorithm (an effective method) which enables one to mechanically decide whether a given statement is an axiom or not.

Gödel’s Incompleteness Theorems:

No consistent system of axioms whose theorems can be listed by an effective procedure (i.e., an algorithm) is capable of proving all truths about the arithmetic of natural numbers. For any such system there will always be statements about natural numbers that are true, but that are unprovable within the system.
(…an extension of the first) Shows that the system cannot demonstrate its own consistency.

Some concrete definitions: Something is consistent if there is no statement such that the statement itself and its negation are both derivable in the system. A formal system is complete if for every statement of the system, either the statement or its negation can be derived (i.e., proved) in the system.

Gödel’s two incompleteness theorems are among the most important results in modern logic, and have various deep implications. Perhaps the most important is what it says about mathematical and computational universality. Gödel’s work underpins the extreme flexibility afforded formal systems, and ultimately the rich universality seen in computation. To understand this important result we need to take a deeper dive into what it is Gödel accomplished.

Gödel Numbering

The notion of mapping is the key to Gödel’s famous paper. This is “mapping” in the mathematical context, where some relation is made between 2 things. The reason mapping is required is because Godel had to speak about formal systems without using the formal system. This makes intuitive sense. If I wanted to critique the utility or truthfulness of a system I cannot do it using the system itself.

The system Gödel was critiquing was one devised by Bertrand Russell and Alfred North Whitehead called Principia Mathematica (PM). Bertrand and Alfred were attempting to realize Hilbert’s dream of making mathematics complete, using abstract symbols devoid of meaning (by stripping away meaning they ensured only the raw rules of mathematics would be showcased*¹).

Gödel’s mapping is called “Godel Numbering.” Gödel numbering makes it possible to assign a unique number to each sign, formula and proof of the formalism (each symbol in the formalism has a corresponding Gödel number”). By using his mapping scheme, Gödel effectively arithmetized the formalism of PM.

Figure 8 shows a simplified concept of Gödel’s approach. Gödel used his meta-mathematics to talk about the formalism of PM, but in such a way that his meta-mathematics were intimately connected to the symbols inside PM. It is through this scheme Gödel could get PM to “talk about itself”, immediately creating the possibility for contradiction.

Anytime we go meta we enter the realm of paradox, since if you can talk about a system then the system can appear to contradict itself. Going meta was used by Cantor for his diagonal argument, and by Popper in his critique of science.

For example, by going meta on Mandarin Chinese I can say things like “Mandarin Chinese does not have the word Mandarin in it.” While this makes no sense, nothing stops me from saying it, since I am standing outside the language itself to say it.

Gödel was able to make meta statements like “the formula G is not demonstrable using the rules of PM.” Now this alone isn’t that interesting, but remember, thanks to Gödel numbering, making this meta statement is the same as PM making the statement itself. This is because Gödel showed there was a direct correspondence between the meta-mathematical statements and the formalism of PM.

Press enter or click to view image in full size

FIGURE 8 Gödel used meta-mathematics in order to step outside the formalism he was critiquing. But his meta-mathematics were intimately connected to the symbols inside PM. This way, Gödel could get PM to talk about itself. Thus, if he could show his meta-mathematics led to contradictions it meant PM itself was also capable of contradiction.

Gödel showed that all meta-mathematical statements regarding the structural properties of expressions inside a formalism can be mirrored inside the “formalism” itself. Gödel used his meta statements to show PM (and any formalism for that matter) to be both true and formally undecidable, and thus incomplete.

This result was (is) huge. It has major implications for mathematics and epistemology as a whole. But the limitations shown by Gödel are related to a formalism’s ability to prove absolute truths. What we lose in the world of proving we gain massively in the world of computation.

Gödel’s work, it turns out, is a formalism’s gateway to extreme flexibility. To understand this we have to understand what I will call the paradox of restriction.

THE CREATIVITY OF MATH AND COMPUTATION

I designed my cover for this article on purpose. While we tend to think of math and computation as rigid and mechanical this view is far too limited. While symbols restrict a formalism to its mechanical operations these operations do not limit what can be represented by the symbols. Math and computation are fluid and highly expressive, but to understand how that’s possible we must appreciate the relationship between recursion and the paradox of restriction.

RECURSION AND THE PARADOX OF RESTRICTION

What if I told you the only way to fully express yourself is to impose limitations on yourself? This paradoxical statement isn’t so crazy-sounding with some thought. If I restrict myself to only eating proper food and exercising I will be free to enjoy a wide range of activities. If I restrict myself to practicing piano everyday (instead of hanging out with friends) I open myself to playing a variety of music. Taking on restriction opens doors.

This paradox is more fundamental than our casual, everyday experience. Mathematics and computation have restrictions that open them up, and in computing this comes by way of representational power.

Representational power is what allows a formal system to simulate/calculate patterns. Figure 9 shows the conceptual difference between an incomplete (restricted) and complete formalism.

Press enter or click to view image in full size

FIGURE 9 Incomplete formalisms have more freedom than complete formalism. Incompleteness means there is “space” for the formalism to “move” around. A hypothetical complete formalism would be severely limited in its ability to calculate patterns.

What looks more flexible to you? Obviously the one with more room to maneuver. Incomplete formalism’s gain their representational power because they are incomplete. If a formalism’s symbols were perfectly consistent and complete they would be incapable of making the irrational contradictions required for universality.

If a formalism’s symbols were perfectly consistent and complete they would be incapable of making the irrational contradictions required for universality.

This means the formalism does not need to symbolically express the environment in order to simulate it. While people refer to Gödel’s theorems in terms of what they limit, they often extend this to the larger, more realistic, world of computing. This is incorrect.

Becoming More

There is only one way something inherently limited can create something more than itself, and that’s recursion. A formalism must be able to “eat itself” in order to create something beyond its own symbols. This fact is in some sense pretty obvious. Look at the following figure:

Press enter or click to view image in full size

FIGURE 10 Restricted systems can create more than themselves via recursion.

Our end result is something we never started with. Of course this is true of any summation, but with summation alone we would run out of symbols to use. The world’s patterns are far too rich to capture using static symbols that are non-referencing . An incomplete formalism incapable of recursion would quickly run into a built-in limit in its attempt to simulate rich patterns. But with recursion this limit it removed.

Recall our discussion on complex problems. Complexity means a massive number of combinations are possible. For a program to be commensurate with this complexity it must be able to approach a similar intricacy to the problem. This is possible by iteratively folding-in symbols, mixing and matching countless combinations to approximate patterns. It is this looping that leads to a massive representational power of any formalism rich enough to eat itself.

This is why math and computation are creative. They are not rigid and mechanical, they are extremely fluid and expressive.

“So math is creative, not mechanical, math is biological, not a machine!” — Gregory Chaitin

Recall Gödel’s work on using meta-mathematics to get a formalism to talk about itself. This is the recursion in Figure 10. A formal system with a rich repertoire of symbols will be incomplete when it comes to strict proofs, but it will be extremely flexible when it comes to computation.

A formal system with a rich repertoire of symbols will be incomplete when it comes to strict proofs, but it will be extremely flexible when it comes to computation.

The Gödel-Turing Threshold

We’ve seen how symbol shuffling and recursion allow a formalism to take-on an immense amount of creativity. With a rich enough repertoire of self-referencing symbols we approach the threshold of true universality. This point has been called the Gödel-Turing Threshold by Douglas Hofstadter. It is this point where basic arithmetic moves from some inanimate system of symbols to what Turing called “Universal Machines.”

Turing realized there was a critical threshold where computational universality comes into play. It is a point where a machine becomes flexible enough to describe its own structure. Turing’s work was based largely on Gödel’s and the connection between them should now be obvious.

“There is nothing more flexible than a universal machine. Universality is as far as you can go.” — Douglas Hofstadter

And so from Gödel we know the richness of integer arithmetic, and from Turing’s abstract theory of computation, which then collided with engineering realities by people like John von Neumann, we come to the realization that machines are in fact capable of being universal.

Press enter or click to view image in full size

FIGURE 11 The 3 major names behind the most important discoveries related to computational universality.

“Gödel found incompleteness and Turing found uncomputability. But in the process Turing also found complete/universal programming languages, hardware, software, and universal machines.” — Gregory Chaitin

Universality Everywhere

Many don’t appreciate that we see universality all around us, every day. The way we run our personal and professional lives now take place largely on computers. These machines are converting inputs to outputs, and in doing do, displaying, redirecting, connecting, and capturing an immense amount of richness.

We can hardly tell the difference between what we experience through a machine and what is deemed reality. We consider our mother’s voice on a phone call to be their true voice, or their face on video chat to be their true face. The televisions shows we watch are reconstructed patterns of real people, CGI creates highly-realistic impressions of things we are familiar with, and scientists simulate all kinds of real-world scenarios.

Behind it all is basic, arithmetical symbol-shuffling. Machines chugging through numbers, using recursions to simulate patterns virtually indistinguiable from our everyday experience. Our reality largely takes place via the perception by virtue of addition and multiplication of integers on hardware.

“…underneath all the colorful pictures, seductive games, and lightning fast Web searches, there is nothing going on but integer arithmetic.” — Douglas Hofstadter

What about Searle?

There is a famous Chinese room argument made originally by John Searle that says a digital computer executing a program cannot be shown to have a “mind”, “understanding” or “consciousness”, regardless of how intelligently or human-like the program may make the computer behave.

I won’t go into the details of Searle’s thought experiment, which readers can easily read online. But his basic premise is that if digital computers can only manipulate symbols then all they can do is convince someone it understands. Searle argues that no matter how convincing a machine might be, it does not truly understand.

Of course this is barely an argument to begin with. Searle talks as though we all know what “understanding” is. But the real nail-in-the-coffin to Searle’s argument is his complete misunderstanding of complexity. A “mind”, “understanding” or “consciousness” would not arise from naked symbol manipulation, rather it would become possible through the rich expression that self-referencing affords such symbols. In other words, what we observe in complex phenomena emerges from a massive network of interaction.

Programs as Emergence

Sure, the symbol-shuffling in computing comes by way of programs. But just as symbols don’t need to capture all patterns explicitly, programs don’t need to encode all the things the program can do. A program’s rules have less and less relationship to the program’s behavior as complexity increases.

Programs can become more than the sum of their parts.

ON THE VALIDITY OF THE COMPUTING ANALOGY

I will now bring the conversation back to Trishank’s latest angle regarding the use of the computing analogy to compare the “smartness” of people. I’ve discussed much of the background I deem necessary for understanding the validity of the computing analogy.

What we know about computation:

information is converted from input to outputs;
routines are used to solve well-defined problems;
heuristics are used to solve complex problems;
all decision problems are, in some sense, the same problem;
basic symbols have immense representational power;
computers are capable of being plastic, creative;
programs can become more than the sum of their parts.

It’s All Computation

Viewing human cognition as something other than computation always appeals to something almost “mystical.” While one can justify their belief in the numinous on a personal level, there is no reason to suggest cognition is not some kind of computation. Humans are converting inputs to outputs all the time. There is information being filtered, used, and expressed. When we understand the representational power of computing this need to argue that humans are not “computers” loses much of its reasoning.

People Use Routines

People use routine-type thinking for shallow, predictable reasoning. Anytime we focus on specific details, where it’s likely an “exact” solution exists, we are using routines. If you find yourself employing a recipe, or believe there is a “way” to do something, this is routine thinking.

People Use Heuristics

Any problem that is nontrivial (complex) involves the use of heuristics. These are known as mental shortcuts when referring to people. The vast majority of interesting problems we deal with each day are far too complex to solve using recipes (the problem’s search space is too large for the application of a simple routine). When people employ “gut feelings” or general rules-of-thumb they are narrowing down where in the search space they will look for a solution. When domain experts make something difficult look easy they are applying a number of heuristics, learned over the years through exposure to the problem space.

Types of Thinking

We looked at both routines and heuristics when reviewing decision problems across the major complexity classes. The type of machine “thinking” applied depends on the complexity of the problem being solved. As discussed above people used both these types of thinking as well.

What is obvious from a computational standpoint often gets relabelled by others. For example, Daniel Kahneman’s Thinking Fast and Slow reflects his thesis on the dichotomy between two modes of thought. System 1 is the instinctive/emotional, while System 2 is the slower/deliberative/logical type of thinking. These are just the differences between routines and heuristics that we see play out in computers*².

Humans as Universal Machines

The human brain shows representational universality. We can internalize external phenomena of many sorts, by creating abstractions, keeping only the information necessary to replicate the pattern in our minds. This is also true of machines. The computing analogy holds, as long we give up the notion that computers are only capable of chugging through a fixed set of directives.

“…human thinking in all its flexible and fallible glory can in principle be modeled by a ‘fixed set of directives,’ provided one is liberated from the preconception that computers, built on arithmetical operations, can do nothing but slavishly produce truth, the whole truth, and nothing but the truth.” — Douglas Hofstadter

It’s About Objectives

We looked at how heuristics are used in computation to solve complex problems. Programs use objectives to help assess how well they are searching the possibility space. The use of objectives in computing overlaps with how people apply heuristics in the real world. While we lack the specific recipe to solve nontrivial problems we often have a goal in mind, and repeatedly apply mental shortcuts until we are close enough to our objective.

This also overlaps with Stephen Wolfram’s Principle of Computational Equivalence. The principle argues that systems found in the natural world can perform computations up to a maximal (“universal”) level of computational power (almost all processes that are not obviously simple can be viewed as computations of equivalent sophistication⁹).

This suggests that the only difference between anything (not just humans) when it comes to so-called intelligence is the objectives involved. Humans have different objectives than other animals, and obviously nonliving systems.

“I’ve been fond of quoting the aphorism “the weather has a mind of its own.” It sounds so animistic and prescientific. But what the Principle of Computational Equivalence says is that actually, according to the most modern science, it’s true: the fluid dynamics of the weather is the same in its computational sophistication as the electrical processes that go on in our brains.” — Stephen Wolfram

The demarcation between intelligences may have much more to do with the objective involved. Whatever inputs-to-outputs computation is being realized in a given system, it is not about the level of complexity but rather the “goal” of the system.

SPEED AS FAMILIARITY

This brings us to my main premise. With the background I have laid out it becomes possible to come full circle to the question posed at the beginning.

Can the idea that some people are smarter than others be salvaged by drawing parallels between human cognition and computation?

I argue that cognitive speed (and quality of solution) is not some ability to process faster in the mechanical/reasoning sense, rather it comes down to exposure to previous patterns. It comes down to familiarity. I mentioned this a while back in a tweet:

Familiarity folds in the notions of both speed and correctness. Being previously exposed to the underlying pattern of a problem means that certain individuals will arrive at solutions more quickly. Importantly, universality means that all people are capable of creating the same analogies. Those analogies are dependent on an individual’s past experiences; their own rich histories that lead to their collection of existing internal patterns.

When analogies are combined with heuristics, of which all people are equally capable of executing, the massive search spaces of truly complex problems is narrowed and resolved into a working solution. The natural conclusion is that what differentiates people by “smartness” is how relevant their past experiences are to narrowing down the search space of a given set of problems. It’s not using routines better. It’s not using heuristics better. These are computationally universal across individuals. It’s bringing analogy to bear on the universal use of routines and heuristics to solve problems.

What differentiates people by “smartness” is how relevant their past experiences are to narrowing down the search space of a given set of problems.

The Fabric of Cognition

In the introduction I wrapped quotes around the word “analogy” in relation to computing. We often use the word analogy as though it’s some second-rate version of the phenomenon we wish to describe. But analogy is much more than some pedagogical device. Analogy is at the heart of how people make connections between seemingly disparate domains.

When people are solving problems they are calling upon what they already know. There is a familiarity to any problem given the rich histories we all have.

Press enter or click to view image in full size

FIGURE 12 Humans make analogies to connect ideas across domains. When solving problems we call upon our past experience to something similar.

Every thought is in some way an analogy. We are constantly recognizing patterns, even if those patterns are not exact replicas of what we’ve seen before. They will always be some level of isomorphism between patterns. An isomorphism is a formalized/strict analogy. Universality in computing means patterns consisting of numbers can be isomorphic to anything else.

“We should have great respect for what seem like the most mundane of analogies, for when they are examined, they often can be seen as having sprung from, and to reveal, the deepest roots of human cognition.” — Douglas Hofstadter

Douglas Hofstadter refers to analogy as the “fabric of cognition.”

“Wherever there is a pattern, it can be seen either as standing for itself or as standing for anything to which it is isomorphic. Importantly, neither interpretation is truer than the other, regardless of the original intention of the original pattern. If it’s a correct description of what is happening then none of them are privileged.” — Douglas Hofstadter

What is critical to understand is that we cannot know where someone’s isomorphism/analogy will come from. In other words, there is no way to say what life experiences/exposures will lead to the existence of the analogy needed to solve the problem effectively. People will point to the “smartest” individuals, but they are only commenting on the fact that that person’s solution was faster/better. There is little to no transparency into that individual’s set of experiences that happen to overlap with the problem being solved. Note that schooling is a very limited domain with which to test one’s smartness.

Press enter or click to view image in full size

FIGURE 13 What does it mean for someone to be “smarter”? I argue that it comes down to making connections between past exposure to patterns and new patterns.

Just as we are blind to the histories that lead to “success” in groups of people (think survivor bias) we are also blind to an individual’s history of patterns. While trivial problems (can be solve with routines) benefit from known patterns (solving a math problem will help if you’ve seen a similar problem before) nontrivial/complex problems can benefit from previous patterns is unknown ways. The opacity of complexity is fundamental.

THE GENIUS MYTH

This brings us to the idea of “genius.” The term assumes individuals are largely responsible for their discoveries. But every discovery depends on a very large pile of previous discoveries and concepts. Just how many previous discoveries/concepts that are required, and how they all relate to each other, is something we are blind to.

Even on smaller scales we see this in any organization.

While we like to talk about human progress as though we benefited from a few individuals making grand discoveries, this is a false narrative. Those individuals are merely the names we attach the discovery to. There is a massive connected network of contribution and previous discovery that flows all the way back throughout history. Random bits of discovery, that sit on top of more discovery, ad infinitum. This network is complex and thus opaque. There is no way to piece together the true story in some linear fashion about how discoveries came to be.

There are no “geniuses” as we like to think of them. There are only names attached to the end of a very long line of connected ideas.

SUMMARY

I’d like to thank Trishank and Ashlesh for inviting me to weigh-in on the computational angle of intelligence. I hope my article helps give readers the background necessary to use computing analogies properly, regardless the reader’s opinion.

REFERENCES

Footnotes (*)

Gödel actually showed that the meaning had not been completely removed from the PM formalism.
Kahneman, as common among psychologists, portrays biases in the pathological sense. As something interfering with more focused thought. This is a limited view of how people think, and doesn’t take into account how adaptation leads to beneficial abilities. System 1 is a byproduct of evolution, arming us with the capacity to solve truly complex problems, whereas system 2 can only operate on simplistic challenges.
What About Trying Many Things at Once? This article focuses on solving problems sequentially. Whether a machine uses routine or heuristic solving it is taking its action in sequence, on after the other. In fact this is why we must use approximate approaches for nontrivial patterns. We would have to wait far too long for routines to attempt enough combinations to land on an answer. But there is a model of computation called the nondeterministic Turing machine (NTM) that looks at what would happen if actions could be taken simultaneously. This is the topic of Trishank’s second article, which questions the argument some people make; perhaps some people can use their minds as nondeterministic Turing machines. As I mentioned in the introduction, I am less interested in the question around humans accessing some kind of nondeterminism. There are many other models of computation, and I see no reason to believe humans are accessing them. But more to the point, this isn’t needed anyway. To understand why we need to look at universality.