Settings

Theme

Exploring Linear A

lineara.xyz

153 points by mwenge 3 years ago · 55 comments

Reader

retrac 3 years ago

For the unfamiliar, Linear A was an ancient script that is associated with the Minoan civilization of the island of Crete, around 1500 - 1800 BC. The later Linear B system encodes archaic Greek, and is very similar to Linear A in glyph form. The Minoan language written with Linear A is probably unrelated to any other language.

Phonetic values are necessarily from Linear B or otherwise guesses - it's very likely there was a great deal of overlap, that the symbol representing, for example, the syllable "ni" in Greek, represented a syllable that sounded a lot like "ni" in Minoan. (Linear B is quite unsuited to writing Greek sounds, an indicator that it was borrowed from a very different language.) But since the language of Linear A remains undeciphered, that is really just an educated guess at best.

OfSanguineFire 3 years ago

Work by amateurs on Linear A does not have a good track record. Since the dawn of the internet era it has drawn more crackpots than almost anything else language-related. Within the professional linguistics community, if someone comes along and claims that he has made any progress towards decipherment, it is generally met with skepticism so strong that one questions that person’s mental health. That said, this website has a caveat that it is for recreational use only, and it points to John Younger’s page at the University of Kansas for something serious. Lay readers on HN should take that caveat very seriously.

  • p-e-w 3 years ago

    > Work by amateurs on Linear A does not have a good track record.

    Linear A is completely undeciphered, so amateurs have done exactly as well as professionals. Meanwhile, Egyptian hieroglyphs, cuneiform, and Linear B were all deciphered by people who would be called "amateurs" by today's standards.

    But hey, why miss an opportunity for elitist gatekeeping, even if the topic is demonstrably one of the least suitable places for it?

    • greggsy 3 years ago

      > Linear A is completely undeciphered, so amateurs have done exactly as well as professionals.

      Academics have absolutely done better than amateurs by virtue of continually validating the fact that it isn’t decipherable.

      The comment isn’t even bashing amateurs - it’s bashing crackpots, who tend to be allured towards the mysterious, especially if their crackpot ideas won’t be inconvenienced by facts.

      • chickenbittle 3 years ago

        Yeah I think it's really important to distinguish between 'amateur', and 'crackpot' here.

        Like that amateur stumbling upon that never-repeating tile recently.

        https://www.quantamagazine.org/hobbyist-finds-maths-elusive-...

        It still required academics to confirm it (and I think realize it's significants).

        In short it's okay to be an academic and it's also okay to be an amateur investigator. Both can and do contribute to the advancement and dissemination of knowledge.

        Crackpottery and pseudoscience not so much.

  • heyitsguay 3 years ago

    Is this site not just a handy visual catalog of known artifacts and transcriptions? Is there some speculative decipherment implied in the phoneticizations?

  • rustymonday 3 years ago

    An architect decoded Linear B.

    • OfSanguineFire 3 years ago

      An architect with significant training in the field, who did his work in close collaboration with the professional scholar John Chadwick. Plus that script had a relatively large corpus and, moreover, it encoded an earlier form of a language we already knew (and we already knew the sound values to expect from earlier Greek, like labiovelar consonants, from comparative Indo-European reconstruction). Not the case with Linear A.

    • dmvdoug 3 years ago

      I mean, yeah, but an architect with advanced classical language training.

    • light_hue_1 3 years ago

      It's really misleading to say "An architect decoded Linear B."

      When Michael Ventris was working by himself he published junk. A basically crackpot theory that was immediately debunked that Linear B was Etruscan. Then Ventris worked hard to become an insider.

      Many key observations for the decoding were done by someone else, a classicist, Alice Kober, right before her untimely death. She worked for 20 years on Linear B and put down all of the foundations. The fact that Linear B has grammatical roots and suffixes, the language is inflected, has case, gender, etc. Kober was one of the first people to work systematically finding patterns and documenting her methods. The work Ventirs did would have been impossible without Kober's methods: extending her work is what worked and gave Ventris his main idea.

      Ventris briefly worked with Kober. It didn't go well. But over time Ventris came to know the key players and to be accepted in the inner circle. One of these players, Emmett Bennett, gave him what Kober did not have: the Pylos tablets. By the time they were published she had died.

      Ventris extended Kober's work to the Pylos tablets. Her work focused on systematically analyzing groups of characters. When he looked at the results, he made his first critical observation: some groups were unique to the Knossos tablets and others were unique to the Pylos tablets. What if these are place names?

      There aren't that many places to be had on Knossos and he knew the Greek names. So he looked for possible combinations and used them to guide the decoding. He used Kober's work and the place names, along with help from at least Bennett, to build a rough mapping from some signs to sound. And then he made his second critical observation: what if Linear B is Greek? Since the Greek names for places seemed to appear.

      Then he could try to decode word after word. And along the way he made his third critical observation: many Myceanean scribes were incredibly sloppy spellers. We can even tell now that some were much better than others, but everything is very messy because even the basic rules of spelling weren't agreed on yet. Not only were characters missing, but a single character could be one of 30+ different syllables at times. Bare statistical methods alone often resulted in a mess because of this.

      Only small parts of the text could potentially be decoded at this point. None of the classicists that Ventris normally talked to were convinced.

      That's when John Chadwick, a linguist, heard about Ventris and tried his idea out. Chadwick was an expert in very old Greek, 1000 years older than Plato. Chadwick was quickly convinced by Ventris because while the decodings were very poor for someone who knew classical Greek, they made a lot more sense to him. They worked together for several years to fix up the decoding.

      An architect did contribute the main idea for the decoding, but an architect that was a connected insider, with a background in Greek and Latin, who had published in the area before, knowledgeable in all of the latest methods, with access to privileged information, in conversation with the experts.

      The way you put it, it sounds like some random architect somewhere looked at Linear B, worked hard on their own, and came up with the answer. That's not even remotely true.

      • goodbyesf 3 years ago

        > The way you put it, it sounds like some random architect somewhere looked at Linear B, worked hard on their own, and came up with the answer. That's not even remotely true.

        But that's true of everything. Newton didn't invent calculus ( neither did leibniz either ). He didn't even understand the idea of a limit. It took contributions of many people over many decades and even centuries to develop the discipline of calculus. Not to mention his ideas came from ancient greeks, et al. The same applies to Einstein and of course the most overrated and misrepresented Turing.

        The idea of a lone genius or a singular great man who works by himself to produce something great is a lie. Brady didn't win 7 superbowls by himself, Jobs didn't create the iPhone by himself and Musk really didn't create anything by himself. It's just PR which creates heros out of mere mortals.

      • xdennis 3 years ago

        > The way you put it, it sounds like some random architect somewhere looked at Linear B, worked hard on their own, and came up with the answer. That's not even remotely true.

        Your assumption here is that a normal discoverer works entirely by himself, but the norm has always been that a discovery is the work of multiple people.

        When someone says that an architect decoded Linear B he means to say that a non-professional decoded it.

        You can't just say: "oh, it's not fair to say that because he was actually really good at it".

      • hillsboroughman 3 years ago

        Very informative summary. A stupid question follows though, so request your patience. Did Michael Ventris really ask the question 'what if Linear B encoded some form of Greek'? Didn't Alice Kober already ask and answer this question, without seeming to do so. The fact that the underlying language was an inflected one and that it seemed to have singular, dual and plural forms for nouns etc - wasn't that enough? Was it academic carefulness that prevented Kober from proclaiming it was ancient Greek?

        • light_hue_1 3 years ago

          Kober was very determined that systematic analysis of the text would eventually work. She rejected the idea that you could just hypothesize what language it was. Because so many people had tried and failed that way.

          Maybe at some point she had this idea. But you really must understand how bad of a fit classical Greek, and even the early Greek dialects, really is. Like.. a few words work out here and there. What convinced Chadwick were the place names, some names of Gods, and one particularly long 13 symbol patronymic. But for anything more you had to start adding, removing, reinterpreting characters and assuming that the original text got them wrong.

          Also Kobler was missing most of the text since it hadn't been published yet, in her small corpus this would have been ever worse.

          Even after people saw the decoding the main sticking point for years was that you need to make so many changes for it to work out in Greek that you're just making up the text. It took decades of work to make the decoding work and many of the decodings Ventris put forward were found to be wrong.

          Eventually Kober maybe would have worked with Chadwick or someone similar who knew a more archaic variant or maybe Chadwick himself would have noticed it.

        • smallnamespace 3 years ago

          Most Indo-European languages are inflected and have singular, dual, and plural forms (if not in the modern language, then in a more archaic form).

          Even Latin retains some dual forms for certain words even though it had otherwise lost it.

          • hillsboroughman 3 years ago

            Homeric Greek had only sporadic use of the dual. It was apparently a matter of metrical convenience. Classical Greek had all but lost the use of dual. Dual ws lost in Latin. Whereas in Mycenaean Greek, the dual number was mandatory for both verbs and nouns. Like in Sanskrit. It is well known that Miss Kober traveled at considerable personal expense (?) and effort to travel from New York to New Haven to learn advanced Sanskrit. I feel there is every reason to believe that Miss Kober already guessed Linear B encoded a form of archaic Greek and her triplets more or less spoke to this informed guess. Just my 2c

    • thaumasiotes 3 years ago

      It is not clear why his decipherment is accepted as meaningful. It has faced significant criticism: https://sci-hub.se/https://www.jstor.org/stable/20162981

      > The Ventris system thus set forth has been widely accepted by Greek scholars, including many of the highest eminence, in many countries. It has also been widely rejected by scholars of eminence, in varying degrees.

      > These Ventrisian rules enable bits of a curious sort of Greek to be got out of Lin[ear] B texts; but experiments have shown that bits of English or Latin or other tongues, when spelt out in syllables according to the Ventrisian system, are capable often of yielding bits of Greek just as plausible as anything in the Ventris-Chadwick Documents volume. One eminent Oxonian, dining at a high table, amused himself by taking the names of the Fellows of the College present and turning them into Ventrisian syllables, from which he made a new translation of them into Greek, in which they all turned out to be Greek gods.

      > gentle reader, pray perpend the syllable-groups (reference number Dy 401), that run: a-ma wi-ru-qe ka-no to-ro-ja qi-pi-ri-mu a-po-ri. Here we have two specimens of the labio-velars, the syllables with q-, discovered by Ventris, to the astonishment of philologists who had not expected to find them in Bronze Age Greek. qe is, of course, equivalent to Latin -que, Greek te, while qi doubtless here shows the development to a voiced dental noted by Ventris and Chadwick in their "Mycenaean Vocabulary,"

      > The Greek evaluation of the sentence would be, according to Ventris's spelling rules, halmai wiluite kainōs Tholoiai Diphilimus apolis: "With brine and slime in novel fashion at Tholoia (the place of tholoi, beehive tombs) Diphilimus (is) cityless." No doubt this is a record of a Bronze Age tidal wave.

      > It is by coincidence that the acumen of Mr. Michael C. Stokes, the Edinburgh authority on ancient philosophy, has extracted the Virgilian hexameter, Arma virumque cano Troiae qui primus ab oris....

      > Note that in this sentence one need assume only two of the six words to be names of persons or places, whereas, in the Lin B material as a whole, 75 per cent of the sign-groups have to be, on Ventris's system, evaluated as names

      • OfSanguineFire 3 years ago

        You cite a 1965 article. That is practically ancient, and no, its criticism is not particularly significant. In the decades since, Ventris’s decipherment has overwhelmingly been accepted by scholars. That is not to say that all of Ventris’s readings are accepted – many are superseded. But the fact that Linear B records Mycenaean Greek along the general lines that he and Chadwick worked out, has long been beyond doubt in the field.

        Mycenaean sources and their consensus readings will be discussed in any decent introduction to the history of the Greek language. I can recommend, for example, the relevant chapters in A Companion to the Ancient Greek Language ed. Bakker and in Colvin’s A Historical Greek Reader as fairly accessible to a general audience.

        • thaumasiotes 3 years ago

          > You cite a 1965 article. That is practically ancient

          > the fact that Linear B records Mycenaean Greek along the general lines that he and Chadwick worked out, has long been beyond doubt in the field.

          What are the major developments since 1965 that strengthened the position of Ventris's decipherment?

          • theoldlove 3 years ago

            Well, for one, a bunch of additional tablets discovered at Thebes in the 90s, which broadly match and hence confirm the decipherment. https://en.m.wikipedia.org/wiki/Thebes_tablets

            • thaumasiotes 3 years ago

              When the criticism is that your paradigm for translating Linear B is so unprincipled that your translation will say whatever you want it to say (compare One eminent Oxonian, dining at a high table, amused himself by taking the names of the Fellows of the College present and turning them into Ventrisian syllables, from which he made a new translation of them into Greek, in which they all turned out to be Greek gods -- the destination is known before the journey begins), how can the confirmation of older Linear B tablets by newer Linear B tablets address that criticism?

              • theoldlove 3 years ago

                Just take 10 minutes and skim the book chapters. The rules of the script are nowhere near as loose as you say. For example, Linear B doesn’t differentiate between k/g/kh like alphabetic Greek does (κ,γ,χ) — an important distinction, sure, but its loss doesn’t let you turn anything into anything else.

                So with the Theban tablets, if the decipherment were false it should have yielded nonsense when applied to unknown texts.

                • thaumasiotes 3 years ago

                  > So with the Theban tablets, if the decipherment were false it should have yielded nonsense when applied to unknown texts.

                  How is this claim compatible with the observation that, when applied to a text written in Latin, the decipherment fails to yield nonsense?

                  • theoldlove 3 years ago

                    In your very article the decipherment does yield nonsense when applied to Latin. Your article converts the first line of Vergil to Linear B and then tries to understand it as Greek, offering “With brine and slime in novel fashion at Tholoia Diphilimus (is) cityless.” But that’s nearly totally meaningless.

                    And even this sentence requires cheating — most prominently, Greek (both in Linear B and later) doesn’t use the -us ending like Latin does, so its use here in a “Greek” sentence is very suspicious.

                    • thaumasiotes 3 years ago

                      > Your article converts the first line of Vergil to Linear B and then tries to understand it as Greek, offering “With brine and slime in novel fashion at Tholoia Diphilimus (is) cityless.” But that’s nearly totally meaningless.

                      This is a pretty odd claim. The sentence is grammatically coherent and the semantics are... there. They're hard to understand, but that's true of essentially all ancient writing; this problem becomes obvious when we try to date historical events by reference to astronomical phenomena that contemporary texts mention. It's easy for us to calculate the precise dates of interesting astronomical phenomena more than a thousand years in the past... but it's difficult to determine exactly what the texts of that period mean when they describe astronomical anomalies.

                      (For something similar and much more recent, here's part of the introduction to The Troubled Empire: China in the Yuan and Ming Dynasties:

                      dragons were spotted seven times in the seventeen years from 1351 through 1367. In that final year, the Yuan dynasty's last, there were two spottings. The first, on July 9, was in Beijing. A dragon emerged in a flash of light from a well in the palace of the former crown prince and flew off.

                      What happened there?)

                      Why is "Diphilimus is cityless" more meaningless than a standard Linear B inscription such as "small jar, no handles: 1 handleless jar"? By most accounts there is more meaning in the sentence about Diphilimus.

                      > Greek (both in Linear B and later) doesn’t use the -us ending like Latin does, so its use here in a “Greek” sentence is very suspicious.

                      Reasonable. But there are many Greek names that do end in -eus, such as Theseus, Perseus, Odysses, Achilles, and Zeus; the reader cited above specifically suggests that the Ventrisian sequence ai-ke-u might be interpreted as the personal name Aἰγεύς. Diphilimos does not appear to be a classical Greek name, so we don't seem to be committed to any particular such form.

                      Lacking syllable-final -n and -s, how would we represent the name of Tiryns in Linear B?

          • OfSanguineFire 3 years ago

            Most of the Chadwick part of the Chadwick–Ventris collaboration was published after 1965. And I just pointed you to two popular references that, in turn, cite a number of publications from recent decades. I suggest you follow up on that.

            • thaumasiotes 3 years ago

              Oh, I certainly will.

              But I was kind of hoping for some indication that developments of that kind actually occurred; it would be the least surprising thing in the world to see a selection effect in the study of Linear B inscriptions whereby students who couldn't reconcile themselves with the idea that decipherment will happily assign a meaning to any text, even where the actual meaning of the text is known to be different, left the field, while students who didn't mind that stayed in. Over time a strong consensus in favor of the position "no, I didn't waste the last 30 years of my life" is exactly what you'd expect to see.

              There are no professions in which the professional consensus is "actually, none of this works". But there are many in which that is the truth.

  • delhanty 3 years ago

    Curious, what concrete progress have professional linguists made on deciphering Linear A?

    • OfSanguineFire 3 years ago

      None. And that is in spite of massive attempts over the 20th century, including some of the first applications of computers to a problem of this nature. The conclusion drawn from this lack of progress is that the corpus is simply too small for decipherment and/or we lack any surviving relatives for the language that the script recorded.

nologic01 3 years ago

I sometimes wonder how much further we well be able to lift the veil of ignorance covering early civilizations (assuming our ongoing existence, cultural interest in the past and ever more powerful technologies in the aeons to come).

Clearly there must be additional Linear A inscriptions in Crete and possibly elsewhere. The cost of finding them enters a spiral of diminishing returns, but that may be remedied at some point.

But, even so, there is no guarantee that even with all surving artefacts uncovered we would be able to reconstruct the language.

Pressumably that "edge of knowledgeable history" calculus plays across many regions and sometimes ignorance is annoyingly "close" to the modern era. Even long after the invention of writing the vast majority of human culture was not recorded and is essentially lost.

  • p-e-w 3 years ago

    > Clearly there must be additional Linear A inscriptions in Crete and possibly elsewhere.

    There are probably many of them in storage in various museums and antiquities departments.

    The vast majority of ancient inscriptions ever found are uncatalogued, and some have never even been looked at. This includes inscriptions in languages we already know how to read. I remember reading long ago somewhere that well over 90% of all known Egyptian hieroglyphics texts haven't been translated yet.

    Therefore, my guess is that once AI is good enough to do classification and translation automatically, there will be rapid progress, without requiring any new discoveries.

shaftoe444 3 years ago

Very weird to see this, I went to an exhibition about Knossos in Oxford only today.

Good episode here that covers a bit about the language and translation efforts. The translation of Linear B is a very cool story too.

https://www.bbc.co.uk/programmes/b01292ts

dghughes 3 years ago

I like writing systems and scripts especially obscure or ancients ones. It never even dawned on me to think of my local region as I did ancient Egypt, Greece, Italy etc.

I was talking to a friend he is Mi'Kmaq here in Canada we call the people here First Nations in the USA it's Native American. He said that the Mi'Kmaq had an old writing system. I checked into it and it predates any contact with Europeans and is one of the very few writing systems by native peoples here. It's called suckerfish writing or suckerfish script the name inspired by the tracks the fish makes in sand.

https://en.wikipedia.org/wiki/Mi%EA%9E%8Ckmaw_hieroglyphic_w...

  • AlotOfReading 3 years ago

    The traditional definitions by linguist tend to exclude anything that can't represent "all oral communication" as proto- or partial writing systems, which are often pejoratively labeled mnemonic systems. Systems that can represent the full range of spoken expression are labeled "true" or "full" writing.

    This had the convenient side effect of neatly classifying all the American writing systems as protowriting in the early 20th century, as well as some more controversial examples like Chinese. Some of those have since been walked back (e.g. Mayan), but most remain in that limbo. We have a somewhat better understanding today that there was a huge variety of visual communication systems across the Americas prior to European contact, but properly redefining the term "writing" to include them is a slow, ongoing process.

fiddlerwoaroof 3 years ago

Looks like there’s a parallel site for Linear B: https://linearb.xyz/

devoutsalsa 3 years ago

On a related note, the Heraklion Archaeological Museum on Crete is fantastic. 100% worth going if you like old stuff. One of the things on display is the Phaistos Disc, one of the best preserved relics depicting Linear A.

https://maps.app.goo.gl/rwJVDVDjaoNJjaNH8?g_st=ic

https://en.m.wikipedia.org/wiki/Phaistos_Disc

ocschwar 3 years ago

The interface is difficult to deal with, but TIL that Linear A potsherd was found in a Philistine site.

VectorLock 3 years ago

Probably getting a bit more popular notice after the mention in the latest Indiana Jones movie (at least, they mentioned Linear B a few times)

davedx 3 years ago

Via this post I found the book "The Riddle of the Labyrinth" about the people who deciphered Linear B. Thank you Hacker News, looking forward to reading this!

cubefox 3 years ago

Related thought: Imagine we received a lot of text in an alien language with a radio telescope, with no "Rosetta stone" to decipher it. Say, 1 TB worth of text.

Now we add to that data another 1 TB of English text, and train an LLM on the 2 TB of data. Then we ask the model (in English) to translate some text from the alien language to English.

Would it work?

  • DemocracyFTW2 3 years ago

    No. You always need some kind of Rosetta stone or other relationship to a known language plus some context and 'plausible guesses' to understand an unknown language. Sure if I gave you III,IIII;VII — II,II;IIII — VI,II;VIII you would be able to guess that these are elementary number signs in what amounts to a rudimentary table of additions. That much would be true whether the snippet is from a potshard of an ancient civilization or received from outer space via a radio antenna. But outside of context—and nothing would be more out of context than an extraterrestrial culture—you cannot even tell with certainty whether I stands for 'one' or 'ten' or 'twelve' or 'thousand', and here we've already reached the end of what a text per se can tell you about its meaning if the signs are not clearly pictorial (and even pictorial scripts like early Chinese or Egyptian hieroglyphs are already conventionalized to the degree that for quite a number of signs in either script we are to this day not sure what they depict).

    Your idea can not work unless the data that you feed the language model with correlated items. It can't. Imagine I feed a predictor with a long list of images on the one hand and, on the other hand, a long list of randomly ordered image descriptions that may or may not match the images. Do you think you could learn a foreign language that way? You absolutely need the image of a donkey be associated with the name for that animal in the foreign language, and the algorithm is no different.

    • WorldMaker 3 years ago

      Also, the assumption that math is universal so sharing vocabulary in math is helpful for bootstrapping language understanding is a fascinating assumption to question. Even if you can explain Pi and prove that you can mutually understand trigonometry that might give you some small portion of engineering insight, but it can't yield most of the rest of engineering such as design or aesthetics (or emotions) or any number of other things that make for useful project communication.

      It's something I've often thought about in the way that the Voyager record was built and Sagan's Cosmos novel assumes it and many others. Even recently, the novel Project Hail Mary borrowed that assumption that math is enough shared language to bootstrap understanding. I think the movie Arrival did some of the best work of showing why that wouldn't necessarily work, but also had the language in question designed by a mathematician and still fell into some parts of the assumption/trope. I'm not saying any of these examples are bad for doing this, I certainly love them all. It's still a small something worth criticizing.

      It's certainly not a bad thing to want to communicate math, and to hope that things like Pi are "constant enough" to provide bootstraps to other communications, but it's also such a fascinating thing how much science fiction thinking (and real world scientific thinking such as the Voyage Record) think that you can just sort of "yada yada yada" your way from "so we established communications of basic mathematical constants and concepts" directly as a straight line of some sort to "now we can communicate all sorts of other things".

    • cubefox 3 years ago

      Those are good reasons, yet the language model discussed above would presumably understand Alienese just as well as it would understand English. So if an LLM understands the meaning of an expression X and of an expression Y, wouldn't it be able to tell how similar those meanings are?

      > here we've already reached the end of what a text per se can tell you about its meaning if the signs are not clearly pictorial

      Note that language models today seem to be quite good at understanding English, even though they are only trained on symbolic text, not on any images.

      • DemocracyFTW2 3 years ago

        Your understanding of 'understanding a language' is obviously different from mine when you write that "the language model discussed above would presumably understand Alienese just as well as it would understand English" and "language models today seem to be quite good at understanding English".

        Language models don't understand any natural language, they're very good at manipulating it (and us!) in terms of continuing patterns across the scale from letter (orthography) to phrases and paragraphs of seemingly utility and correctness. In that regard, yes, the aforementioned model will likely have no difficulty in reproducing novel outputs that would appear likewise useful and correct to Alienese speakers as is the case for English. However this assumption, too, should come with the disclaimer that unless someone produces a reliable test for the utility and correctness of the same LM for a variety of natural and invented languages with divergent grammars (such as including e.g. polysynthetic languages which have a very different view of what constitutes a 'word') without having to tweak any of the many finnicky parameters of these models—we can't be sure the model won't produce garbage when trained on the next 'exotic' language. So who knows, in English you use very few infixes and a lot of grammar takes places between fairly constant, fairly short words; a model with a given set of parameters that works well for such languages may not be very good at languages that has words built from many specific prefixes, infixes and suffixes that are as expressive as entire phrases in English. Just like the current generation of text-to-image generators are pretty good at a lot of things but then screw up when asked to picture a cornfield.

        • cubefox 3 years ago

          > Your understanding of 'understanding a language' is obviously different from mine when you write that "the language model discussed above would presumably understand Alienese just as well as it would understand English" and "language models today seem to be quite good at understanding English".

          > Language models don't understand any natural language, they're very good at manipulating it (and us!) in terms of continuing patterns across the scale from letter (orthography) to phrases and paragraphs of seemingly utility and correctness.

          Come on, chatting an hour with GPT-4 should remove all doubt that it understands you quite well. Otherwise, what would be understanding? Lest it turns out that we are stochastic parrots, too!

          https://www.bing.com/images/create/cornfield/64b58e89d412420...

      • tiluha 3 years ago

        The trained model would likely be able understand both Alienese and English equally well, but it never learned to translate even one word or context. It might have an internal representation for "eating food" in both languages, but since since no links exists between the languages the embeddings will not be close.

        You could try it on earth with if you train a model on two separate languages, being careful that the traning data does not contain any mixed language. But even then, modern Human languages most likely have too much cross-contamination. Would be an interesting experiment nevertheless

        • cubefox 3 years ago

          That's the question, would the embeddings be close?

          It's not clear that they wouldn't. Would an embedding of the Alienese word for "and" be close to the embedding of the English "and"? This does seem quite possible to me.

          > You could try it on earth with if you train a model on two separate languages, being careful that the traning data does not contain any mixed language. But even then, modern Human languages most likely have too much cross-contamination. Would be an interesting experiment nevertheless

          I agree. Though shouldn't we be able to answer this a priori? It sounds like a mathematical question.

im3w1l 3 years ago

I wonder if LLM's would be able to crack it. They should have a decent shot I feel.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection