Francis Gooding · Rocket Science for Monkeys: Sounds before Words

The word is not the thing. In spoken language a word is a distinctive sound or series of sounds. It does not have a ‘natural’ relationship to the thing it stands for. Ferdinand de Saussure theorised that a sign is made up of two parts: the signifier (the physical form taken by the sign, for instance a spoken word or its written representation) and the signified (the concept the signifier picks out). Any signifier in a given language could in principle be different, and could be replaced by another: it is not linked to the concept it represents by any objective necessity. Saussure doesn’t talk about actual things in the world: he concerns himself only with the elements of signifying systems. Other linguists call the thing itself the ‘referent’ of a sign. But all agree that the English word ‘rabbit’ has nothing to with rabbits, any more than the French and Yoruba words lapin or ehoro do. There is no idea more fundamental to the modern conception of language than what Saussure called ‘the arbitrariness of the sign’.

The trouble is that this may be not completely true. Some words do seem to resemble the thing they describe. Onomatopoeic words, like ‘boom’, ‘click’ or ‘ping’, are the clearest and most familiar case. Linguists call these ‘iconic’ or ‘sound-symbolic’ words: they seem to imitate or take on features of their referent. Early disquisitions on language frequently puzzled over the question of how rational speech could have emerged from the non-speech of animal calls and cries, and Enlightenment treatises on the origins of speech sometimes proposed that iconic words might have been a halfway house. The ancients had similar ideas: in Plato’s Cratylus, Socrates argues that the first names assigned to things would have been given in imitation of their objects, by using the mouth or tongue to mimic their qualities – so something that moves or flows would have been named using the letter ‘r’ because ‘the tongue is most agitated and least at rest in pronouncing this letter,’ while the word for something that was slippery would use the letter ‘l’, since in its pronunciation ‘the tongue glides most of all.’ The conclusion that the origin of naming is to be found in such linguistic imitation is ‘unavoidable’, Socrates says, for ‘there is nothing better on which to base the truth of primary names.’ It is a story about the origin of speech, by any other name.

Saussure steered linguistics away from questions about the beginnings of language: for him it was a red herring, since words take meaning only in relation to one another, within the boundaries of their histories. The study of words can’t illuminate what came before words: there is no thread to be found in language which would help us trace human speech back to the moment of its emergence. ‘No society … knows or has ever known language other than as a product inherited from preceding generations, and one to be accepted as such,’ Saussure says in Cours de linguistique générale (1916). ‘That is why the question of the origin of speech is not so important as it is generally assumed to be. The question is not even worth asking; the only real object of linguistics is the normal, regular life of an existing idiom.’

Yet whether it is worth asking or not, the question of the origin of language never goes away. It remains one of the most fundamental mysteries of human evolution. So far as we know, true symbolic language is unique to the human species. (On the most generous reading it may go a bit further back in the human lineage. And there is an open question about cetaceans – it was recently discovered that the structure of humpback whale vocalisations is remarkably similar to the organisation of human speech.) And it continually recurs as the most probable explanation for the differences between human behaviour and that of all other living things. If you ask why we have been able to make pyramids and spaceships and musical instruments, while no other animal has managed anything of the sort in three billion years, the answer will always cite language as a decisive factor. So the question of how we alone came to be blessed – or cursed – with words is not to be lightly dismissed. But it does come with a serious difficulty: language is an evolved feature of the human organism, but words don’t fossilise like bones. How then to find the missing links?

The Language Puzzle is a grand tilt at that seemingly intractable problem. In it, Steven Mithen marshals the disparate factors and fields of research that might give us some clue as to how language evolved, and tries to build a plausible account of how we ended up as the only speaking animal. Of necessity, the book ranges very widely, because the fields that touch on the evolution of language are in no way unified. Mithen draws from palaeontology, archaeology, primatology, the study of animal communication, linguistics, neurobiology, philosophy of mind, evolutionary genetics and more.

When investigating the ancient past, typically there is at best only partial evidence for a proposed evolutionary sequence: often that evidence will consist of little more than a few morphologically similar fossils, the remnants of creatures separated from one another sometimes by millions of years. And even by these standards, the evolution of human language is a particularly tricky case. Not only is language in large part a behavioural phenomenon, so that the consequences of its development can only be inferred on the basis of ambiguous and circumstantial archaeological evidence (stone tools, traces of fire etc), it is also dependent on the use of soft parts of the body (the tongue and the larynx, but of course mainly the brain), which don’t leave a fossil trace. As a result, it’s hard to ascertain which physical shifts may have accompanied the development of speech.

A further difficulty is that Homo sapiens is the only survivor in the human lineage which split away from the chimpanzee six million years ago. As recently as 100,000 years ago we shared the world with several other kinds of human being – Neanderthals, Denisovans and the like – and before that even more. Could they speak with us? We don’t know, though the human genome shows that people like us were on familiar enough terms with some of the others to produce viable offspring with them. But we’re the only humans left, so have no one else to talk to and no one with whom to compare ourselves. Observing the behaviour and communication of chimps and other primates can tell us all sorts of things, but yields only attenuated forms of evidence about the evolution of language.

All this said, it is inarguable that language is an evolved feature of the human organism: the structure and functioning of the brain and of the soft and hard palates are proof enough. They will have developed in response to sustained, multifarious selection pressures, resulting in more complex communication, more control over the voice, increasingly complex neurological architecture and so on. We can’t be sure what those selection pressures were, though some have speculated that initially they may have had to do with a change from living in forests to surviving in the savannah. But however the human speech apparatus came about, its physical attributes put an upper limit on, for example, the absolute number and kinds of vocal sound that humans can produce: ephemeral as it is, language is enabled and delimited by the physiology of the organism.

At a certain point during the development of human speech, and out of a huge array of more or less complex communicative sounds, the true sign, with its signifier and signified, must somehow have emerged. Is it right to say that the sign – the word – ‘evolved’? Or did sign-words emerge into communication and consciousness out of a complex of other mental and communicative functions that previously did other jobs? Did this happen gradually or suddenly? Can we hazard a guess at what sort of creatures first spoke true words, as distinct from making other kinds of sound? Is there any trace of this transformation? Or was the point of entry into the forest of symbols sealed up behind us long ago?

These questions bring us back to those ‘iconic’ or ‘sound-symbolic’ words. In the 20th century linguistics tended not to take much notice of them. With their awkward and apparently not quite arbitrary resemblance to sounds and textures, their stubborn resistance to change, and their long association with the outmoded inquiry into the origins of language, they were relegated to the status of what Steven Pinker could call, as late as 1994, ‘a quaint curiosity’. But as Mithen makes clear, such dismissals were premature. Recent research suggests that iconic words may after all have the crucial originary role that thinkers from Socrates to Herder assigned to them, as a genuine remnant – or analogue at least – of one of the evolutionary staging posts that marked the way to modern human speech.

Not every linguist, even in Saussure’s day, treated iconic words as a dead end. Onomatopoeia is only the most common kind of linguistic iconism. Experiments in the 1920s suggested there were other ways in which words might have non-arbitrary relations to their referents. In 1929 Edward Sapir, an American anthropologist and linguist who had studied under Franz Boas, conducted an experiment in which subjects were told that the nonsense words mil and mal both meant ‘table’, and asked which they thought ‘seemed to symbolise’ a large or small table. Every subject tested picked mal as the word for a large table, and mil for a small table. This was the case whether they spoke English or Chinese, or were children or adults. Also in 1929, the German psychologist Wolfgang Köhler found that when subjects were asked to say which of the made-up words maluma and takete applied to a spiky shape and which to a round one, they overwhelmingly chose maluma for the round shape and takete for the spiky one. (I tried this on my daughter: she too chose this way, and found it entirely obvious which word went with which shape.) Köhler thought the explanation for these results lay in the way the mouth and tongue have to be shaped and moved in pronouncing the words: he proposed that the sharp staccato sounds and fast movement of the tongue in saying takete encouraged the choice of the spiky shape while the rounder, softer shape of maluma linked it to the round shape. Similarly, Sapir observed that to enunciate the word mal requires the mouth to open wider, thus suggesting a larger object, while mil requires a closing in of the mouth, suggesting a smaller object. (Socrates, too, picked out this correspondence, noting that the sound of the letter ‘i’ is used to ‘imitate all the smallest things’.)

Several decades of indifference later, the American linguist Roger Wescott returned to the problem of iconism. Writing in 1971, he observed that i and ee sounds are preponderant in words signifying ‘small’ (for instance ‘tiny’, ‘light’ or ‘wee’), and suggested that the round-sounding vowels a, o and u are associated with things that are large and slow (as in ‘vast’, ‘huge’ or ‘sluggish’). Many consonants, he went on, have sound-symbolic roles too: words featuring laterals like l (‘in which the tip of the tongue blocks the passage of air’) seem to correspond to smallness or lightness, while labials, made using the lips, as in b or m, are linked to largeness (as in ‘big’, ‘boom’ or ‘massive’). Wescott even pushed beyond the sound and production of words to see iconic elements in morphology, syntax and stress. Words that indicate extension or growth, for instance, often themselves get longer (as in ‘big’, ‘bigger’ and ‘biggest’), and reversals in meaning are frequently signified by a reversal in word order (‘I will’ v. ‘will I?’).

Subsequent investigations in the 1990s and into the 2000s sharpened the accuracy of such observations, finally cementing iconic words and sound symbolism as significant parts of all languages. There is now a mass of work on the subject, which has demonstrated a ‘universal propensity’, as Mithen writes, ‘to associate specific sounds with specific meanings … a considerable proportion of one hundred basic vocabulary items show persistent sound-meaning associations irrespective of language families, environment or culture.’ Large statistical models have compared data from thousands of languages, and the results appear to confirm the suppositions of earlier investigators, Socrates included. The association of i with ‘small’ and r with ‘round’ recurs persistently (it turns out that the association of u and o with big things isn’t so consistent), and n sounds are frequently associated with the nose, l sounds with the tongue and m and u sounds with the breast. (Roman Jakobson suggested in the 1960s that the nasal sounds made by the infant during suckling were the root of words for ‘mother’; the near universal association of l, m and n sounds with words for ‘mother’ is well attested, as is the connection of the hard-stop consonants d and p with the father, in words like ‘dad’ and ‘papa’.) There are also some statistically significant associations the reason for which isn’t obvious, for instance between a and words for ‘fish’ (pla in Thai, psari in Greek, machhli in Hindi, eja in Yoruba, sakana in Japanese, samak in Arabic etc).

Infants and children learn iconic words earlier and more easily than they learn arbitrary words, and iconic words remain dominant in the vocabulary of children until around the age of six, after which there is a gradual shift towards arbitrary words. ‘Iconic words are easier to learn,’ Mithen writes, because ‘their meaning is grounded in the sensations experienced by the child – the sound, size, shape, texture, movement and other properties of the object or action being named.’ By providing a fundamental link between speech sounds and objects in the world, they ‘scaffold the entire process of language acquisition’.

Perceiving a link between a vocalisation and the tactile or visual characteristics of an object is thought to be dependent on ‘cross-modal perception’, the tendency for multiple senses to interact while perceiving something. The most familiar form of this is synaesthesia, whereby sense perceptions of one kind also stimulate parts of the brain used in perceptions of a different kind; one may experience a strong association between a word or sound and a particular colour or shape. It is thought that synaesthesia and other cross-modal phenomena take place because of ‘leakage’ between different parts of the brain, which is common in younger children: before the age of about ten, the developing brain is still so plastic and richly interconnected that there is typically a lot of traffic between regions that later become more distinct.

In 2001, two cognitive scientists, V.S. Ramachandran and Ed Hubbard, published a paper proposing that synaesthetic links between vocalisation, bodily movement and the sense perception of objects could have prompted the creation of iconic sounds in an early human ancestor, thus opening the gateway to speech. They returned to maluma and takete, to the ‘small’ sound of ‘i’, and to other cases in which the movement of the mouth seemed to mimic the meaning of a word, or even the movement of other parts of the body: for instance, when the mouth or lips appear to borrow from the typical action of the hand, as in the numerous words for ‘you’ that involve the ‘pointing’ of the lips towards another person; or the way in which the making of the small i or ee sound could correspond to the pincer action of forefinger and thumb when picking up something small. If synaesthetic links were operating in the increasingly flexible brains of early hominins, perhaps they could have had an effect on vocalisations, resulting in the creation of the first mutually intelligible words – mutually intelligible because their meanings would have been established through shared experience.

Fascinating though all this is, it remains, like so much in the field, a hypothesis lacking crucial evidence. It is also dependent on a loose analogy between early childhood and the evolutionary past – an echo of Ernst Haeckel’s largely discredited idea that ‘ontology recapitulates phylogeny,’ or that the developing organism moves successively through the forms of its ancestors. The first question must be whether there is any evidence for cross-modal or synaesthetic perception in the vocalisations of other primates. It does seem to be the case that rhesus monkeys associate longer, louder sounds with the larger monkeys who make them (though this is surely not rocket science, even for a rhesus monkey). And a test involving human and chimpanzee subjects found that both associated bright colours with high-pitched sounds and dark colours with low sounds. However, chimpanzees – even those few trained to understand human language – ‘fail’ the maluma/takete test. The best conclusion that can be drawn from such results is that the last common ancestor between humans and chimpanzees may have had at least some cross-modal perceptual ability, and that more complex synaesthetic links between sounds and objects developed in the human lineage sometime later.

That is pretty thin gruel, and as Mithen points out, even if we accept the proposal that language began in iconic words we still ‘need a mechanism for the transition to lexicons dominated by arbitrary words’. Because they are tied to sense experience, iconic words have only limited capacity to communicate concepts and ideas, and can’t always differentiate between closely related things. Mithen gives the example of a sound-symbolic word for a fast-flying bird that by its nature can’t distinguish between two slightly different fast-flying birds. The argument is that in creating the first iconic words through synaesthetic associations, iconism also prompted the development of the arbitrary words that would follow, since only arbitrary words could differentiate the finer textures of experience that iconic sound symbolism had disclosed to those beings who could use it. It’s a good idea, but once again there is no evidence for when or how this might have happened. Ultimately, Mithen is forced to conclude that, for all the research and thinking done about iconism, synaesthesia and cortical leakage, nobody has any idea when or how the arbitrary sign emerged.

Faced with such frustrating stubs, plausible speculations and partial truths, Mithen turns to archaeology. When were there leaps in the design and innovation of stone tools? When did controlled fire start to be used widely? And after that, the brain. What is the timeline for increases in brain size among hominins? What can we learn from casts of the brain cases of extinct human relatives? Mithen quarters the field with care and imagination, cross-checking, noting correspondences and filling in blanks, finally emerging with a carefully synthesised narrative account of the way language developed and finally flourished into modern speech across three million years of human evolution. We are left with the impression that the ancients’ chief error was in thinking that the process took place consciously among already fully human people.

We now see that instead of a prisca lingua created by principal name-givers or people in a state of nature, the process of language acquisition took place over millions of years in the bodies and minds of a series of ancient beings: Homo erectus, Homo habilis, Homo heidelbergensis, Homo neanderthalensis and, finally, the last human standing, Homo sapiens. No doubt there were many others. At some point, perhaps a hominin made a sound that mimicked the movement of a snake or a fish, or a sound that was small like an insect, and eventually these became words; and then a sound that was originally made in imitation of something steadily departed from it in everyday usage, until it was no longer mimicry but was instead an abstract word; and then another abstract word was needed in order to be more specific about how to make a stone tool, and people told stories around the fire, and mothers cooed at babies and so on and so on, until we arrived at modern language. The potential role of ‘motherese’ in language evolution, and the perspective that a more female-centred history of language evolution might provide, doesn’t get much airtime in The Language Puzzle. There are many more fires, hunts and stone tools in Mithen’s story than babies and comforted children, and when it comes to thinking about talking it isn’t immediately obvious why that should be so. But however the story is told and whatever refinements may be made, the arrival of the arbitrary sign is the crux, and although we know more than ever about what may have preceded it, what was necessary for it to happen, and what the first sort-of sign may have been, the event itself remains stubbornly out of reach. In the beginning was the word, and the word is lost.