Settings

Theme

How Search Works

google.com

324 points by vijaydev 13 years ago · 107 comments

Reader

philsnow 13 years ago

I missed most of the content on this ... page ? Exhibit ? Installation ? whatever it's called, because it told me to scroll, I did, and I scrolled through a bunch of what looks like empty space and arrived at the end ("and that's how search works"). The user is apparently supposed to stop and watch some animation at certain places, but it's not clear where to stop scrolling.

Perfect example, near the top there's some text about "It's made up of over[........] 30 TRILLION[.........] INDIVIDUAL PAGES[........] and it's constantly growing." But there's nothing to indicate that I should stop somewhere and wait for some more text to show up.

Maybe they should limit how far down you can scroll by setting the height of some element, and only increase it when the animation is finished.

Edit: the key problem here isn't the "scrolling makes things happen" gimmick that's popular lately. the problem is that it starts certain animations or fade-ins some time after I've already skipped past an apparently blank space.

  • will_brown 13 years ago

    I gave you a +1 because I had the same "scrolling issue".

    After your comment I noticed a lot of comments on the same issue, so I decided to try it again. The second time I noticed that a blue arrow flashes at the bottom of the screen after all the content has populated, almost promoting you to scroll down. I suppose most everyone, including me, initially scrolled to fast to even see the first "arrow/prompt". Despite the discovery of the prompt feature, some of the issues remain. Example, wondering how far to scroll down before stopping (maybe pgdn?), and wondering which parts of the "page ? Exhibit ? Installation ? whatever it's called" are interactive.

    >page ? Exhibit ? Installation ? whatever it's called

    I too was unsure what to call it, but if they listen to the feedback, I think "whatever it's called" is really awesome and could be a legitimate substitute to the ppt platform. At least I would be interested in making a few presentations with it.

    • devcpp 13 years ago

      >I too was unsure what to call it, but if they listen to the feedback, I think "whatever it's called" is really awesome and could be a legitimate substitute to the ppt platform. At least I would be interested in making a few presentations with it.

      Agreed, I like it. It looks like the stuff that goes on in futuristic movies, with a lot of things happening on the screen and no one is really looking at anything particular but it's impressive. This has a potential. Who's up to make this WIC software?

  • jsmeaton 13 years ago

    Yep, happened to me too. No indication of where to pause and watch, and had to keep back-scrolling as I noticed text appear just below the chrome. Bad implementation.

  • alive-or-not 13 years ago

    Scrolling with single 'space' key presses works fine on a desktop.

dangrossman 13 years ago

The most interesting thing there is the live view of the most recently deleted webspam. I wonder what blackhat SEO firms can learn from that to better avoid the filters.

  • gokhan 13 years ago

    Exactly. And someone somewhere writing a script to hammer that screenshots to collect as much as he can.

  • badgar 13 years ago

    I haven't looked too closely, but that view says it is only giving examples of "pure spam," as opposed to the more sophisticated forms of spam described. One might imagine that "pure spam" is the easiest to detect, so giving examples of pure spam might not be giving much away.

area51mafia 13 years ago

It's nice overall, but the timing for making items appear is a little slow. I was past most headers by the time they appeared, and I don't think I scroll too incredibly fast.

  • crynix 13 years ago

    It might be your machine/browser/os. The same thing happened on my laptop (OSX/Chromium), but it worked fine on my desktop (Arch/Chromium).

    • ScottWhigham 13 years ago

      If you read the top voted comment for this post (as I write this), it describes the same experience though.

franze 13 years ago

thx matt and the google search team for doing this. it's nothing new for technically inclined people, but every little bit helps. helps for what? teaching people to worry about the right aspects of search and the impact on their business, instead of worrying about bullshitphrases that were planted in their head by a SEO agency key account or a blogpost from 2008. so well yes, thx for doing this. i will send it to my clients (and tell them to click on the bubbles, even though they don't look clickable)

now an anecdote (because i feel like telling one): this week started for me with an interview that finally got published http://werbeplanung.at/news/marketing/2013/02/interview-mit-... (it's german) in that interview i claimed that

* 80% of everything written about SEO and Google is bullshit

* that all the rumors, tipps and trends are actually hurting business

* that we should treat SEO as a numbers based craft of constant optimizations

* instead of the esoteric bullshit art it is currently

* and, if search traffic is important for the success of a business, they must rid themselves of external (agency) dependencies and develop internal structures

nothing to far fetched i think. everybody knows the SEO vertical is full of bullshit, i just took some time to estimate a number (based on a random sample of collected blogposts (that at least one person tweeted about))

yeah, i got a lot of angry emails, skype messages, linkedin messages, xing messages after the interview was published.

most of them mentioned at least one of these words

  * pagerank
  * whitehat
  * blackhat
  * grayhat
  * linkjuice
  * panda
  * pinguin ...
so yeah, thx google for educating people about search. keep up the good work.
  • Julianhearn 13 years ago

    Your 80% is based on what exactly? A tiny sample size. Please if you don't have solid data don't quote percentages it just encourages people to spread the number like it's a fact, which it isn't.

    If you read the right sources a majority of seo advice is correct.

    Www.seomoz.org http://static.googleusercontent.com/external_content/untrust... Www.inbound.org (homepage stuff that has been voted up.)

    • Inufu 13 years ago

      >> If you read the right sources a majority of seo advice is correct.

      That's a contradiction. If you have to read the right sources, then by definition the majority of advice is not correct.

      • Julianhearn 13 years ago

        Why does it mean the majority of advice is not correct? That is a myth.

        • nialo 13 years ago

          Because the majority of advice is coming from the _wrong_ sources

          • Julianhearn 13 years ago

            You say majority like its fact. Its not.

            Here is a list the top 100 seo blogs, find the BS in there. www.branded3.com/seo-blogs/

  • GhotiFish 13 years ago

      "80 Prozent von allem, was über SEO geschrieben wird, ist Bullshit"
    
    some things just cross the language boundaries.
tmoertel 13 years ago

Has anyone deciphered the fat-mustache diagram in the "Query Understanding" circle? It's in the Algorithms section.

At first I thought it was supposed to represent a Gaussian-like probability distribution. But when I clicked on it, the resulting animation showed a series of such distributions getting flattened by some kind of distribution-flattening hydraulic press. The accompanying caption: "Gets to the deeper meaning of the words you type."

If I was confused before, now I was completely lost.

How is deeper meaning represented by distribution flattening? I'd think it would be just the opposite, raising probability mass around the likely meanings, not spreading it out into a uniform distribution over all meanings.

Baffling.

If anyone has figured it out, please do share.

(Maybe I'm taking the diagrams too seriously.)

EDITED TO ADD: New option: If you don't have any clue what it means either, come up with an entertaining yet plausible story that fits the hydraulic-press-vs-mustaches animation and share that story instead.

EDITED TO ADD: Example: At Google’s new eco-friendly data centers, NLP computations are performed by genetically enhanced inchworms. Difficult queries, however, can cause the inchworms to get cricks in their backs. In such cases, Google’s innovative back-massager descends and restores the inchworms to their preferred position (prone), from which they can return to their computations with renewed vigor.

  • MattSayar 13 years ago

    You're taking diagrams too seriously.

    But the way I interpreted it was, before, the query was short, scrunched up, and slightly ambiguous. The algorithm them lengthened it, representing expanding it to find the deeper meaning.

  • arithma 13 years ago

    I was confused for a little bit by that. The way I took it was: "Google removes the wrinkles from your query as to make it processable."

  • bcoates 13 years ago

    That actually seems to be what Google does to your keyword searches: replace the specific with the general, turn proper names into redundant phrases ("schannel socket" -> "channel socket"), suggest dropping keywords, etc.

  • plainOldText 13 years ago

    I think Query Understanding might trigger the weather, conversion and the other widgets to be displayed at the top of your search results. It's just a guess, since I don't work for Google. (:

dylangs1030 13 years ago

I don't know what to take from this.

That search is very complex (I knew that, but not with this technical detail).

Or...that Google is trying very hard to maintain user interest with gimmicky shows of why it's cool and cutting edge and necessary.

Not that Google isn't those things...this just seems like an unnecessary expenditure of time. We know it's complex Google. Improve some other features and stop shutting others down instead of making these web 2.0 animations.

jojopotato 13 years ago

Interesting that they show the approximate number of searches / second at the bottom. Is that an otherwise publicly available number?

eykanal 13 years ago

I was halfway through before I realized that some of the content was clickable.

Very nice page, though.

JDDunn9 13 years ago

Their characterization of their spam procedures is grossly misleading. They do not send emails to most people that have been penalized, nor do they give clear instructions on how people can fix their sites.

Thousands of small sites were killed by Panda for no good reason, and have little hope of getting their traffic/incomes back. Google's spam policy is skewed heavily in favor of large sites and their own properties.

  • rossjudson 13 years ago

    Didn't read that way to me. Doesn't it say that the webmaster tools page is the primary way to get notifications?

    Crap factor = %advertising on page.

_mvuc 13 years ago

I keep checking every so often, but searching for "this phrase" or +absolute +requirement is still broken. Even "Verbatim", isn't. If they can't even get simple search right, who would trust them with anything more?

  • jaytaylor 13 years ago

    I agree, it is super irritating to no longer be able to do precise searches on google like I used to. Is there another search engine you would recommend which provides this functionality?

  • eitland 13 years ago

    Same here. Tried reporting it a number of times in different ways but nothing ever happens. Have posted a few examples as response to moultanos answer further down in this thread.

  • moultano 13 years ago

    Do you have some example queries to debug?

    • _mvuc 13 years ago

      I wish I had saved the results every time over the last few years that Google showed me a page it claimed had what I was looking for, when neither searching the visible text or even the source code of the page produced any such string. I am sure that it's happened to me hundreds of times by now, if not thousands. For a long time, it was surprising and ranged from annoying to infuriating. Now I just sigh and accept it as the cost of Googling.

    • eitland 13 years ago

      I have: - http://techinorg.blogspot.com/2013/03/what-is-going-on-with-...

      edit: Here are just the queries:

      "sublime text 2" "focus group"

      cisco "anyclient" - this one gets silently rewritten to cisco anyconnect

      shopify "deduplicate" - with verbatim activated -

aviswanathan 13 years ago

Scrolling is really becoming the new thing in UX design. It's an interesting contrast to the 'movie-like' flash animations of a few years ago that required no interaction on behalf of the user.

  • gnu8 13 years ago

    At some point users started closing the tab instantly as soon as it becomes clear something non-interactive like a flash movie is the central element of the page. The scrolling page is the optimal way of letting the user read the content at the desired pace.

    • lucb1e 13 years ago

      A youtube clip would have done just fine. You can pause that if you need a slower pace. Scrolling is very slow here (and I'm using Google's own Chrome on mainstream hardware), and it's never clear where I should stop scrolling to view the page. I don't get the fuzz about scrolling websites.

  • largesse 13 years ago

    Scrolling is really becoming the new thing in UX design.

    Am I the only one who finds it irritating as hell to scroll when it renders slow? I don't think this is the end game. There has to be something better.

prezjordan 13 years ago

They left out the part where they index your emails and choose items you agree with over items you don't :)

  • hurstdog 13 years ago

    I think you're joking, but just for those that don't think that, we don't actually do that.

    • will_brown 13 years ago

      The following is direct from Google's Security and Privacy:

      "In order to provide some of the core features in Google Apps products, our automated systems will scan and index some user data. For example:

      -Email is scanned so we can perform spam filtering and virus detection.

      -Priority Inbox, a Gmail feature, scans email message to identify which messages are considered important and which are considered not important.

      -If you are using Google Apps (free edition), email is scanned so we can display contextually relevant advertising in some circumstances.

      -Some user data, such as documents and email messages, are scanned and indexed so your users can privately search for information in their own Google Apps accounts.

      *Google Apps data is not part of the general google.com index, except when users choose to publish information publicly."

    • kappaloris 13 years ago

      then how can you scroogle and boobble people without those informations?

    • alan_cx 13 years ago

      Where is the stuff about the creepy invasion and abuse of our privacy?

      I know, I know, you don't do that. Nope, no one does. Everything is fine and dandy. Smile every one, no problem here.

    • Stealth- 13 years ago

      There definitely is a form of a search bubble though, right?

    • anoncow 13 years ago

      I don't remember if hotmail used to run ads.

      • snowwrestler 13 years ago

        It did and they were display ads. Incredibly distracting.

        At one point the "homepage" of Hotmail was a huge ad space, stories from MSN, and a tiny link to "Inbox."

        The new Outlook is so much better. If Hotmail had evolved that way earlier, I would not have switched to Gmail.

        • anoncow 13 years ago

          So it is a bit hypocritical of MS talking about ads in gmail. But again where those ads contextual?

          • snowwrestler 13 years ago

            The "Scroogled" campaign has nothing to do with products or customers; the point is to broaden the PR base for Microsoft's ongoing campaign to convince the feds to initiate anti-trust proceedings against Google. That is why they hired a political PR executive to create the campaign.

  • weareconvo 13 years ago

    Grammatically, that makes no sense :)

Xorlev 13 years ago

38,800 requests/second according to their estimation.

johnmurch 13 years ago

Is this just PR for Google? Would rather see a more technical approach - although great for forwarding to clients when asked :)

  • ChuckMcM 13 years ago

    Apparently, perhaps the 'scroogled' campaign is having an effect.

    However it does give a better insight into the challenges of building a search product. It is a series of really challenging problems. So many people take search for granted these days.

    • johnmurch 13 years ago

      Yes and no - 'scroogled' is bring up stuff like - selling ads based on context - but are you paying $20/year for outlook.com email? Gmail is free and pretty awesome (haven't gotten spam in years).

  • darxius 13 years ago

    It may just be PR, but I do think a lot of non-technical people can benefit from going through the animation. It's pretty amazing that that is what's happening. It could just be an attempt by Google to expose their craft to the technical layperson.

cryowaffle 13 years ago

Whoa... really, 100 MILLION gigabytes to store "The Index"? Wow. That's big.

  • aiiane 13 years ago

    aka 95+ petabytes.

    • ithkuil 13 years ago

      100 million gigabytes = 100 petabytes ~= 88.8 pebibytes

      100 million gibibytes ~= 95 pebibytes

      • runlevel1 13 years ago

        I see the value of this distinction, but I can't shake the feeling that a word used for years has been co-opted by marketing and replaced with something that sounds silly when spoken out loud.

    • DigitalJack 13 years ago

      I prefer 100 MegaGigaBytes, son.

sytelus 13 years ago

There are some good facts and numbers hidden in rather toy explanation:

1. Spam detection is automatic

2. There 6 types of spam

-Unnatural outbound links (link selling)

-Content copy/manufactering

-Keyword stuffing

-Forums/user generated spam

-Parked domains

-Sites hosted on spammy DNS

-Different content humans and bots

-Hacked sites

3. Google is removing as many as 50K spam sites per month, they get 8K reconsideration requests

4. Google's machine learned relevance model may be using about 200 features

manojlds 13 years ago

> By the way, in the 47 seconds you've been on this page, approximately 1,813,260 searches were performed.

Aren't these just some random numbers that they pull out of the air?

  • jroseattle 13 years ago

    Here's the unminified JS on the site responsible for the numbers updates.

       var kd = function () {
        function a() {
            e = e || Q("number_of_seconds");
            d = d || Q("searches_count_num");
            f = f || Q("searches_count_unit");
            var a = ~~ (((new Date).getTime() - h) / 1E3 % 86400),
                k = a * b + "";
            f.innerHTML = " " + c[Math.ceil(k.length / 3)] || "";
            e.innerHTML = a;
            d.innerHTML = k.replace(/(\d)(?=(\d\d\d)+(?!\d))/g, "$1,")
        }
        var b = ~~ (1E11 / 2592E3),
            c = " hundred thousand million billion trillion quadrillion quintillion sextillion septillion octillion nonillion decillion undecillion duodecillion tredecillion quattuordecillion quindecillion sexdecillion septendecillion octodecillion novemdecillion vigintillion".split(" "),
            e, d, f, h = (new Date).getTime();
        return {
            hc: a,
            rb: function () {
                a();
                setInterval(a, 100)
            }
        }
       }();
    
    It's just running on an interval and doing in-page calculations, so it's entirely estimated. The value of "b" in this function evaluates to a little over 38,000 (https://www.google.com/search?q=1E11+%2F+2592E3) which they're using as the basis for the calculation.
  • robinh 13 years ago

    Not sure why you're being downvoted, as it seems like a legitimate question, but...

    I don't think so. It seems logical that Google's been keeping statistics about this sort of thing, so it doesn't surprise me that they keep track of such things as 'average queries per second'.

  • alok-g 13 years ago

    That would be about 38K searches per second. Does this include Google instant searches?

    Google search results show a time value for each search. E.g.: About 2,210,000,000 results (0.12 seconds). Is this time machine time per search? This number is often around 30 ms, give or take a factor of two. If so, each machine can handle about 30 searches per second. If so, 38K searches per second need about 1000 machines. Sounds a bit too low... so my interpretation must be wrong at least somewhere.

    • jfim 13 years ago

      It's probably the wall time for the various backend services to respond to the query. If you think about it, a Google search result is actually many things; it has results from various sources, such as the web, images, videos, news, social signals from G+, etc. All of those are different services that are aggregated to build your result page.

      Since all of those queries are fired at the same time, the only metric that matters at the end is the wall time, not the CPU time used during the query.

      I also seriously doubt that the servers that handle the Google front page can only do one query at a time; at the very least, they're multithreaded, but probably concurrent. It probably works as below:

      1. Parse query 2. Send query to backend servers 3. Wait until all backends replied or at most 250ms (or some other timeout) 4. Assemble the result page and ship it back to the client

      While the server is idling for the backends to reply, it probably processes other queries; it wouldn't make sense to waste that much CPU power.

      Finally, your example says 0.12s (a random query on my end gave a response time of 0.69s), which is 120ms (or 690ms for mine), which is more than twice 30ms.

    • Geee 13 years ago

      You didn't define 'machine'. If the 'machine' is Google's supercomputer grid cloud cluster, then yes, each search takes 30 ms of machine time.

      • alok-g 13 years ago

        Is there any publicly known information about what the 30 ms number means (or alternatively what the machine is)? Given 30 ms number and the number of searches per second, the number 1000 means something; I just don't know what.

  • mcintyre1994 13 years ago

    It probably just increments by some fixed amount each second, but it seems like a statistic they would have an estimate of.

    • raylu 13 years ago

      From the beautified app.min.js:

          function a() {
              e = e || Q("number_of_seconds");
              d = d || Q("searches_count_num");
              f = f || Q("searches_count_unit");
              var a = ~~ (((new Date).getTime() - h) / 1E3 % 86400),
                  k = a * b + "";
              f.innerHTML = " " + c[Math.ceil(k.length / 3)] || "";
              e.innerHTML = a;
              d.innerHTML = k.replace(/(\d)(?=(\d\d\d)+(?!\d))/g, "$1,")
          }
          var b = ~~ (1E11 / 2592E3),
      
      So yes.
aeon10 13 years ago

A beautifully designed page more than anything else

lysium 13 years ago

Nice scroll-UI! Took some time to see the clickable items. Interesting bits about spam pages.

moeedm 13 years ago

An awful way to learn anything.

state 13 years ago

The better people understand their tools, the more effectively they can use them.

wfunction 13 years ago

"We write programs & formulas to deliver the best results possible."

No kidding.

denysonique 13 years ago

Some of the live listed 'spam' pages appear to be genuine to me.

joshhart 13 years ago

Answer: It uses a bunch of skip lists.

Source: I do hacking on top of lucene.

yarou 13 years ago

vijay: very interesting link. thought it was interesting, despite the obvious slant.

moha24 13 years ago

This is not how search works!!

asawant 13 years ago

This is brilliant !!!

OGinparadise 13 years ago

"We write programs & formulas to deliver the best results possible"

There's a slight oversight, it should be: "We write programs & formulas to deliver the most profitable results possible for this quarter"

  • will_brown 13 years ago

    I do not know why you were down-voted, perhaps for not fully forming an argument and making a tongue-in-cheek comment as an immediate response to the arrogant statement Google unnecessarily included in the description of how search works(yes, the same statement stood out to me, as out of place and arrogant, but not necessarily untrue).

    As to your point, yes, Google does utilize its power, leverage, dominance to favor itself and its own products - and don't feel to bad others are demanding you show your proof - how quick they are to forget (and apparently Google's own employees who replied to your comment forgot also) that the FTC spent the last year investigating Google's behavior on this front - some of those charges into Google's behavior include using its knowledge of search and advertising to determine the most profitable online businesses, entering the space with their own product to compete directly or just drive up the price of the advertising terms (sometime 1000%). So imagine you were buying key word "y" for "$x"/click - Google comes along and competes, now their product is at the top of the organic results and you will need to pay (1000 x "$X") for the same advertising - oh by the way when they pays (1000 x "$x") for the same ad space that money just goes back to its own pocket.

    So do not feel to bad - the FTC spent millions investigating Google to find said evidence and ultimately allowed Google to settle for $22.5 million, Google allowing others to use the Motorola patents is acquired, and changing their AdWords API. And keeping with their motto: "Don't be evil" it appears in the last 24 hours media has gone wild alleging Google spent $25,000 to honor the FTC Director during the investigation - I know when I am being investigated for federal anti-trust allegations I too like to honor the investigator, and like Google I do not give the investigator the money directly, I give it to a 3rd party who in turn gives it to the investigators office, this allows the investigator time to close the case before allegations are made and when allegations are eventually made it allows the investigator the opportunity to say at the time it was unknown who "donated" money for the honorarium.

  • moultano 13 years ago

    This is completely false. The effect on revenue is not used to make launch decisions for ranking changes.

    • JDDunn9 13 years ago

      So when Google's Panda update killed tons of user-generated-content sites like Mahalo, eHow, HubPages, etc., and greatly improved YouTube's (which is 99% garbage) rankings, that was pure co-incidence?

      What about when Google rolled out universal search only after buying YouTube?

      • SquareWheel 13 years ago

            "What about when Google rolled out universal search only after buying YouTube?"
        
        Because the technology wasn't built then? Google had a video platform before they owned Youtube, you know.
      • rossjudson 13 years ago

        I don't miss any of the crap content factory sites. Do you?

        • JDDunn9 13 years ago

          Nope, and I wouldn't miss YouTube's crap factory site from the SERPS either.

    • OGinparadise 13 years ago

      Says who?

      Search is rank and display. Products is 100% bought and that you had to "disclose". I say "disclose" because it's not apparent, unless searches click on a link, that's how ethical you are.

      What else is bought and paid for behind the scenes? Why should we trust you?

  • badgar 13 years ago

    > There's a slight oversight, it should be: "We write programs & formulas to deliver the most profitable results possible for this quarter"

    Says... you, right? Based on which examples of Google pursuing quarterly profits at the expense of users?

    • OGinparadise 13 years ago

      "Says... you, right? Based on which examples of Google pursuing quarterly profits at the expense of users?"

      Duh! Google local was a joke compared to better pages from Yelp. Google+ even worst. Pages are now filled with ads, because Google discovered that ads yield better results (how convenient!) Need I go on?

      • paranoiacblack 13 years ago

        Yes, actually. And with real evidence other than your skewed opinions. Or you can keep being overly dramatic, your choice.

        • OGinparadise 13 years ago

          "And with real evidence"

          what would qualify as "real" evidence to you? A hidden camera catching Googlers talking about this?

          • raylu 13 years ago

            It seems you're making an argument of the form "it's too difficult to collect real evidence. In fact, here's an example of a way to collect real evidence that is ridiculously difficult. Since we cannot collect real evidence, we need not use it to make arguments."

            Just because it's difficult doesn't mean you can jump to conclusions based on speculation.

            • OGinparadise 13 years ago

              No it's not difficult at all. Re-read my comment and see how I gave three clear examples that explain Google's "best results for the users" bullsh*t.

          • paranoiacblack 13 years ago

            Well, with the certainty you speak of this with, yes. Oh also, I'm being facetious because I work there and I'm interested in this secret pile of evidence you have. All things being fair, I won't deny that you've seen the things you've seen, it's just that you might be attributing them to the wrong places. Hence, the need for actual evidence.

            • OGinparadise 13 years ago

              I am certain and so is everyone with an open mind. Look at results pages and it's clear why all the Google pages and ever-growing number of ads were inserted, MONEY! Mo money for the always greedy Google.

              • Bjartr 13 years ago

                Google has for all but the very beginning of its existance been an ad company. This is not new.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection