Settings

Theme

Google labs:word frequency in books over the last 200 years

ngrams.googlelabs.com

23 points by prat 15 years ago · 27 comments · 1 min read

Reader

I was surprised to see the high popularity of the word "fuck" prior to 1820

EliAndrewC 15 years ago

The example in the OP (fuck) was so common until the early 1800s because of the typographic convention to substitute an f for an s. In other words, the word "suck" was being written as "fuck", which is why the word appeared so often until the early 1800s.

  • orls 15 years ago

    You can see the changeover quite clearly by comparing the two against each other: http://ngrams.googlelabs.com/graph?content=fuck%2C+suck&...

    If we assume all pre-1800ish mentions of 'fuck' are definitely meant to be 'suck', it still features much more prominently in the corpus beforehand than after.

    Any ideas why that might be? E.g. certain types of text that were more common before that era, or other (less, er, 'suck'y) types of text that came after, 'diluting' the corpus?

  • tseabrooks 15 years ago

    Any background on the origin of this typesetting convention? I'd like to know the whys and whatfors...

    • EliAndrewC 15 years ago

      The technical term for this is a "long s", which Wikipedia describes at great length: http://en.wikipedia.org/wiki/Long_s

      • edge17 15 years ago

        thanks, that was incredibly fascinating and enlightening, especially the examples.

        The long s survives in elongated form, and with an italic-style curled descender, as the integral symbol ∫ used in calculus; Gottfried Wilhelm von Leibniz based the character on the Latin word summa ("sum"), which he wrote ſumma. This use first appeared publicly in his paper De Geometria, published in Acta Eruditorum of June 1686,[2] but he had been using it in private manuscripts at least since 1675.[3]

  • jcr 15 years ago

    If you change the bounds to include the 1700's, the prevalence of the term is more pronounced (if you pardon the pun).

Groxx 15 years ago

Utterly awesome. http://ngrams.googlelabs.com/graph?content=My+name+is+Inigo+...

Potentially even more awesome is that they have the entire dataset available for download o_O

edit: case sensitivity is more fun than insensitivity: http://ngrams.googlelabs.com/graph?content=Star+Trek%2Cstar+... vs http://ngrams.googlelabs.com/graph?content=star+trek%2CStar+...

edit2: there are a whole bunch of geek-term bumps around and just after 1900. Anyone know why? E.g.: http://ngrams.googlelabs.com/graph?content=Star+Wars&yea...

  • splat 15 years ago

    I have no idea, but my guess is that they don't know the dates for some books and the system automatically classifies the publication date as "1900" or "1901." If you search the word "quark," you also get a bump at around 1900 even though the word wasn't coined until Joyce's Finnegans Wake in 1939.

PetrolMan 15 years ago

I find it kind of interesting that a lot of words peak around the middle of the 19th century and have been in decline ever since. I'm guessing this has something to do with the increasing number of books published but it is still kind of hard for me to imagine that "the" is less commonly used now than one hundred years ago. The pattern holds true for a lot of common words...

sylvinus 15 years ago

Is this weird ? :)

http://ngrams.googlelabs.com/graph?content=google&year_s...

edge17 15 years ago

http://ngrams.googlelabs.com/graph?content=terrorist&yea...

thekevan 15 years ago

They had smartphones in the 1900s? Could this be related to that woman supposedly seen speaking on a cell phone in the Charlie Chaplin video?

http://ngrams.googlelabs.com/graph?content=smartphone&ye...

(Actually, "internet" also has a similar spike. I suspect some books are mislabeled in their dates.)

jalmos 15 years ago

Given the birther-related news today, I was curious about another uncouth term. Sad results:

http://ngrams.googlelabs.com/graph?content=nigger&year_s...

iunk 15 years ago

I don't know in what context they could use Geek in 1800. http://ngrams.googlelabs.com/graph?content=geek&year_sta...

ryan42 15 years ago

some interesting results: (sad about liberty)

http://ngrams.googlelabs.com/graph?content=liberty&year_...

http://ngrams.googlelabs.com/graph?content=l33t&year_sta...

http://ngrams.googlelabs.com/graph?content=hacker&year_s...

samuel1604 15 years ago

while love is going down http://ngrams.googlelabs.com/graph?content=love&year_sta... sad..

dlsspy 15 years ago

I'm going to have an impact on google's internet bill this month.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection