Settings

Theme

Data is at the heart of search, but who has access to it?

andreasgal.com

105 points by dpw 11 years ago · 87 comments

Reader

ChuckMcM 11 years ago

Sigh, this is incorrect.

edit: incorrect is perhaps too strong, it is incomplete.

While it is true that click tracking can be used as a relevance signal, the people who were really pissed off when the data stream got dumped were advertisers who wanted to buy AdWords. That was a very simple system, pay someone for clickstream data, extract trending queries, front those with AdWord buys to get your page on the top of Google's results, and profit.

Having built a search engine and run it for 5 years, we got to see what people felt was relevant and what wasn't in a very loose way with click stream data. Basically you have a query and 10 blue links you can split the results in quartiles and figure out if the thing they clicked on was top half, bottom half, top quarter/second quarter etc. And do A/B testing to see how that played out. But what we found was that the best indication of what a page was about, was the text that linked to it. If you have an in-link to a page which was "<href='page'>great radio site"[1] then "great radio site" would be a query that should return that page which might be titled something like "bob's electromagnetic spectrum imaginarium" or something equally unlikely to come up in a query string.

So the bottom line is that there are lots of ways to try to determine relevance, click stream data is a part of that but by no means the biggest factor.

[1] neutered html for obvious reasons.

  • Animats 11 years ago

    The value of looking at queries is that it allows learning what questions users ask. The front end of the search process is to infer from the query what the user really should be given. That's a machine learning problem. The head of Google search remarked recently that "as the search engine gets smarter, the queries get dumber".

    This is reflected in Google's search results. A Google query which can possibly be interpreted as related to a popular culture item usually will be. Google has become more aggressive about this over the years. Their "Did you mean" result tag once offered an alternative for a second search. Now, they return results for the more popular interpretation first.

    The back side of search, page quality and ranking, is weaker than many think. Links are less useful than they used to be. Most links to business sites are now from "social" sites or forums, which are easily spammed. Using social signals was a disaster back in 2012, when, for a few months, Google went all-in on social signals. Google tried to recognize sites that "look like spam", but everybody knows that now and spam sites look better than ever. (The same thing happened with spam emails a decade ago.) Google doesn't recognize provenance, so they can be fooled by scraper sites. Google doesn't recognize the business behind the web page, so they can be fooled by marginal businesses. There are even SEO companies using machine learning to reverse engineer Google's algorithms, to find out how far they can go with keyword stuffing before a penalty kicks in.

    Google does far more manual adjustment than they did two years ago. There's an army of people doing manual ranking, and a smaller unit handing appeals from manual penalties. There was a time when Google boasted they did no manual adjustments to ranking. The automation is starting to fail.

    • sanxiyn 11 years ago

      1noon (Korean web search startup) tried to recognize provenance and was somewhat successful. But that wasn't enough to win in the market. Naver acquired 1noon.

  • Sven7 11 years ago

    But where's the competitive ecosystem in search? Innovation in search is restricted to few hundred people in Mountain View. And that's a tragedy.

    What Google did for innovation in smartphone\tablet\browser they have gone and done the opposite for search.

  • minthd 11 years ago

    Chuck, while blekko is a great search engine(especially due to custom search), it's clear that it is very different quality wise from Google.Same for Bing - it's not upto Google.And not for the lack of trying or money(bing).

    So how do you think Google is succeeding so well, if it's not click stream data? and why can't it be maybe a combination of things that strongly depends on click stream data that others couldn't copy?

    • ChuckMcM 11 years ago

      Actually if you do double blind tests you will find that Bing and Google are indistinguishable. We did this at Blekko earlier with our "3 card monte" gambit where you did a query, got back blekko, bing and google results, and got to pick the one with the "best" results for your query. Blekko usually won if it was query we had a slashtag for or if it was a "highly contested" query (lots of ad spend like "no fee credit card" or "cheapest insurance") In the former case our curation meant that more results were appropriate, and in the latter case our spam filtration left us with better results. If it was a general query for which we didn't have a category for, and it wasn't highly contested, google and bing split the results, often 40/40/20 sometimes as low as 35/35/30. And if it was a long tail query like "turnip growing in south philidelphia" or something very specific with few sites associtated with it, and we didn't have it in a slashtag, Google would "win" those. Microsoft borrowed our idea and did their whole "bing and decide" campaign.

      Many people realize that if you put Google ads on Bing's results and Bing's ads on Google results the profitability would switch (not that I am entirely sure what that says other than having a credible search engine and top end Ad inventory is required to make excess money in search)

      It will be interesting to see if Marissa gets back into the game with Yahoo when their agreement to use Bing results for Yahoo searches expires.

      The interesting linkage is that you can't sell search advertising unless people send the search request to you, and if you're not the most common place that people search, you're unlikely to get first shot at advertising. You can "buy" traffic (that is called Paid Distribution) by putting your search box on people's web site, or causing someone's browser to send you search queries first, or paying a phone maker to send you all their search queries, but you have to make enough money from the ads to offset what you pay. And as I mentioned over the last 8 years Google has been paying more and more for their traffic (up to $968M last quarter) and very few entrants into the business are going to compete with that. If you already have a platform (like Mozilla has Firefox, Apple has the iPhone, Facebook has pretty much everyone's Facebook page) so you "own" the ingress point, you can leverage that with a good search engine to make a lot of revenue. But if you need to pay for access to the ingress point, and pay a big chunk to the ad provider, it is really hard to support a lot of infrastructure (which is proportionally expensive to index size). That is the constraint box of search today.

      The interesting thing for me is that every quarter, of the last 16, Bing has been making more money per click and Google less, that cost equation is balancing out. That is going to put a lot of pressure on the non-core parts of Google.

      To answer your question, Google succeeded well when capturing the value of linkage data to extract page relevance (the original Page Rank patent), they created an advertising incentive which made their algorithm break (you want a billion in-links to your page, no problem! say the black hat SEO folks). Google is still making tons of money on search but you can look at their performance over the last 4 years to see the air is coming out of the balloon. What comes next is still an open question.

      • xxxyy 11 years ago

        I participated in a blind test between Google, Bing and Yahoo in my Information Retrieval class at a university, back in 2013. The results were: 1) Google, 2) Bing, 3) Yahoo - for every standard IR metric we thought of, which included NDCG@{1, 5, 10}, MRR, MAP.

        • ChuckMcM 11 years ago

          Did the results get published? Were the queries "external" or "user generated"? We found it very informative to compare the results of relevance testers (which were people who were shown a query and a set of results) with users (which were people who actually generated the query and evaluated the results). I had hoped to get a study done to get more data on that.

jfuhrman 11 years ago

>In Germany, for example, where Google has over 95% market share, competing search engines don’t have access to adequate past search data to deliver search results that are as relevant as Google’s. And, because their search results aren’t as relevant as Google’s, it’s difficult for them to attract new users. You could call it a vicious circle.

This is interesting because of the browser choice enforced by the EU on Windows. IE whose default is Bing lost share to other browsers like Chrome, Firefox and Opera which all had Google as the default. So an attempt to fix the browser market totally distorted the Web Search market. I wonder why MS didn't request to the EU that the alternate browsers in the browser choice screen had to have Bing as the default search.

I wonder if the EU will mandate that search relevancy data must be shared by Google with rival search engines like DDG just like they mandated that SMB shares and Office formats must be documented by MS and released to developers.

  • dheera 11 years ago

    Ethics and morality aside, I'm curious what allows the EU to "enforce" laws on a US company. Let's say Google and Microsoft don't register entities in the EU. Can they do anything?

    Can Microsoft and other US-based technology companies theoretically just keep doing their own thing, tell the EU government "to hell with it, we're abiding by US laws, you have a choice to stop importing Windows and invent your own OS if you don't like us"?

    • jfuhrman 11 years ago

      Sure they can, but it's a huge market almost on par with the US and it opens them up elsewhere in the world to competition emanating from the void they leave in the EU, i.e alternate OSes and search engines.

      • dheera 11 years ago

        I get this completely, although let's say Microsoft just ignored their requests for compliance. Would the EU seriously dare to ban Windows? I feel like they'd get outrage from their own locals and topple their own economy if they banned Windows, so they probably wouldn't. Therefore, does Microsoft need to care? Could they just sit around in Redmond and keep developing as long as the US doesn't care?

        As for websites, some countries could just block a US website that doesn't comply with local regulations (e.g. China, Iran, Myanmar, et al.) and this has happened numerous times. But the EU? Having already reached a free speech society? Censoring a US website on the grounds of non-compliance with arbitrary one-sided demands would be at odds with their own established bill of rights.

        Google said "to hell with it" to China and got blocked. But what if they did the same to the EU? I don't think the EU could block a website without causing serious upheaval from their own citizens.

        • fnord123 11 years ago

          They would fine Microsoft. If Microsoft didn't pay they would have their assets seized. Or their credit rating damaged. Their credit rating drop could then put them as junk status and thus mutual funds would have to divest from Microsoft. Microsoft share price would be negatively affected.

          NB: I am not a lawyer or economist.

    • M2Ys4U 11 years ago

      The EU is the world's largest economy, do you really want to shut your entire business out of that market?

      What then happens if a competitor is established to take your former position in the European market - chances are they're not just going to stay in the EU. They're going to eat your lunch elsewhere too.

      • xxxyy 11 years ago

        Similar argument can be applied to China. This is precisely why Zuckerberg is learning Mandarin and networking there.

      • dheera 11 years ago

        How would the EU shut you out? Would the EU actually dare to begin censorship?

        • xxxyy 11 years ago

          No, of course not by censorship. Through: fines for monopolist practices (happened to Microsoft), general smear campaign (happens to Amazon in Germany over working conditions), poking with a stick (the "right to be forgotten"), or just plain old taxes (the new "internet tax" is a current topic in the EU). There is always a way if you are determined enough. Politics.

        • M2Ys4U 11 years ago

          It's not about censorship. It's about doing business.

solve 11 years ago

Other than the index data, there's something even bigger.

Google's biggest PR success is convincing everyone that the quality of web rankings depends almost purely on algorithms. It does not. What allows Google to hold their monopoly is the $100s of millions (or more) they continuously pay to amass more manually created training data:

http://www.theregister.co.uk/2012/11/27/google_raters_manual

http://www.forbes.com/sites/timworstall/2012/11/27/is-google...

A new search engine could appear today with algorithms 10x better than Google, but without access to this scale of training data, their rankings wouldn't even be close to Google's quality.

Google maintains their position by paying cash for this monopoly on training data made by tens of thousands of $9/hour workers, not through superior algorithms!

bobajeff 11 years ago

I think a problem that is happening here is that there is no competition in search just like there is no competition in social networks and operating systems. Not like there are for things like automobiles, electronics and clothing.

Computers introduce a means to lock people in that don't exist in other markets. In software products there are often ecosystems that tie directly in to the product/service which are not required to be shared with competitors unlike with road systems for cars.

Regulators ought to look into ways to enforce measures that require the companies to completely open their ecosystem to competitors. Or look into ways to standardize these ecosystems and require every service/application/website comply with them (similar to how media companies are forced to include closed captioning).

  • pain 11 years ago

    "Jobs did great harm to the world with his iThings: computers designed to be jails for their users. His genius was to find the way to make these jails desirable so that millions would clamor to be locked up." —Richard Stallman

    • ntakasaki 11 years ago

      What is more open, a Chromebook or a Windows laptop or a Macbook?

      I would think a Windows laptop or a Macbook because the users and developers can install or develop any application, yet we have everyone singing the praises of heavily DRM'ed and locked up Chromebooks and iPads. Sometimes I feel it's more about Microsoft hate than about a free computing environment. At least RMS is consistent and is less prone to company fanboyism than the tech crowd.

      • eveningcoffee 11 years ago

        I think he was talking about iPhones and iPads that helped to create a marked for locked down hardware systems where nor developers of the software and users of the software have no say.

        • ntakasaki 11 years ago

          >where nor developers of the software and users of the software have no say.

          The Chromebook is actually worse, since the iDevices at least give you access to run native applications even if they have to be approved by Apple. On a Chromebook, native applications can only be made by Google.

          • eveningcoffee 11 years ago

            Fair enough, but Apple (Jobs) started this madness.

            • pain 11 years ago

              Basic needs mad to map?

              If madmess means objectification relation issues, are object-relational pattern mapping recognition issues, then we need to address roots causes of offense being made (said infosec minus madus emosec) from the office of authority over self copy.

              Systems thinking rarely accounts for system feeling (https://fb.com/groups/roboswears/permalink/574282622692010). Social contract terms can evolve past pleasure-trauma-based war compeatition business forms of safety, to register mental health as material health, user rights as business rights.

              Data-legal-rational-emotional-empathic business terms are missing.

              Our software and our hardware is taking turns at manipulating real physical and emotional problems by forgetting how many hands it takes to move a machine, and putting a dress over working parts, to light a figurine of determination that barely speaks without hurting.

              Even Steve Jobs had to beat odds that respect objectification before emotional development, coming from acid trips wiring depth to doing ACID tests just to pass text notes, we live with tools that we want to share to help, but business terms are designed to define us as without legal-rational-medical parity to corporal hierachies of need.

      • castratikron 11 years ago

        Chromebooks are locked up now? I thought you could install your own Linux on them. Don't some even use coreboot?

        • ntakasaki 11 years ago

          You can install Linux on Windows PCs without even needing to developer unlock, doesn't that mean they're as open as Chroembooks. Not to mention things like if the battery goes completely dead on some Chromebooks, Linux is completely wiped along with the data and replaced by ChromeOS. Also have to press Ctrl-D past a vscary warning on every single boot on some Chromebooks or flash a new bios.

          Can Mozilla make a Firefox for ChromeOS? How many Chromebooks that are being dumped in the education space are having Linux installed on them? Google has root on ChromeOS and the user doesn't. The whole purpose of them is to force the user into uploading all their data into Google's cloud. That's why even a $1400 machine has a paltry 64GB of storage but comes free with a few years of 1TB space on Google Drive.

        • joosters 11 years ago

          Unfortunately you can only install Linux on some of them.

sanxiyn 11 years ago

In South Korea, Google's market share is below 5%, and Naver gets more than 80% of search queries. I think this is the reason why Google's search results for Korean contents are not as good as contents in other languages.

jjoe 11 years ago

So the whole push for SSL/https from Google has been opportunistic rather than good practice. I mean why would a search engine go as far as to make SSL a ranking signal?

  • dheera 11 years ago

    Sites that use or at least offer SSL probably also tend to be higher-quality sites. The combination of verified identity and payment means that it's a natural filter for people who are at least semi-serious about their project.

  • pixl97 11 years ago

    > I mean why would a search engine go as far as to make SSL a ranking signal?

    Because any number of 3rd parties have been injecting their ads and other crap as MITMs. SSL is a better, but not foolproof way to make sure the content you get was the content served by the remote server.

  • nostrademons 11 years ago

    It could be opportunistic and a good practice. Users do benefit from sites that offer SSL. It's just that Google benefits too.

ocdtrekkie 11 years ago

It makes you wonder how many changes were made for "privacy" and how many changes were made for "protecting our business".

  • stevenbedrick 11 years ago

    Is it necessarily an "either/or" situation here? This seems to me like an example of a "both/and".

    • ocdtrekkie 11 years ago

      That's fair. I just wonder which half was the selling point that made the change happen.

      • geoelectric 11 years ago

        It's honestly hard to say. Privacy is a selling point, especially nowadays.

        My guess is that the proposal probably included the cliche "win/win situation," had already been sitting in someone's back pocket, and the raising of it was either sparked by some privacy-related news story -or- a market event of some kind. At the end of the day, it doesn't really matter.

        I think there are a handful of techs that lend themselves to natural monopoly--basically anything where the expense of building sufficient infrastructure for a minimally-competitive product requires previous success in the market. This is true whether the infrastructure is copper lines or a body of previous searches.

        That means you're either one of the first ones there with low cost of entry and building on your own successes (Google); or you're shifting to the market from success in an unrelated area (Bing); or you're essentially locked out unless you can somehow acquire access to that infrastructure.

        My guess is search will eventually turn into an antitrust-regulated industry. Really depends on whether up and comers like DuckDuckGo can really stay relevant based on ideology and whether old players like Yahoo can really re-enter the market successfully.

        But the most likely scenario really appears to be a duopoly between Google and Bing at best, and more likely simply a monopoly for Google.

        The analogous solution to telecom would be forced access to search queries for alternative providers (a la CLEC/ILEC) but privacy concerns will make the situation interesting to say the least.

        Possible it may eventually turn out that mainstream search engines simply have no specific privacy protection, at least for aggregate data. Since that's in both the corporations' (market leaders aside) and government's best interest, seems plausible. That'd be a lot of power behind it.

pcl 11 years ago

Interesting. I wonder to what extent this reasoning was behind executive support of the Chrome project, and whether it was a factor from the onset or something that Google stumbled upon after developing a browser.

  • sanxiyn 11 years ago

    I am 100% sure this is the reason Chrome was funded. (I don't doubt Chrome developers' goal was to develop the best web browser in the world, but business case for doing so is different matter.)

    • rockdoe 11 years ago

      Chrome was also an insurance policy. You can't buy away Google being the default search engine in Chrome. Imagine pre-Chrome browser marketshares and imagine the impact the Firefox-Yahoo deal would have had.

  • systemBuilder 11 years ago

    In my opinion, Chrome was funded for two reasons,

    (a) At the time Chrome was launched, IE was dominating with ~69% market share: https://d28wbuch0jlv7v.cloudfront.net/images/infografik/norm... And, Firefox/Mozzila was topping out at 25% market share! They were basically resting on their laurels! Remember that the SPDY protocol which is the prototype standard for HTTP 2.0, was invented at Google and was the main innovation within Chrome 1.0. If you do a timed google search 2008-2010 for SPDY you will see that the SPDY whitepaper page was Nov 12, 2009 : https://www.chromium.org/spdy/spdy-whitepaper So Chrome was launched to make web browsing faster.

    (b) Google Search does not want to be excluded from all browsers. The solution to this problem is to fund your own browser. If IE will dominate Firefox forever and Google was depending on Firefox defaults for much of its search traffic, then Google was virtually FORCED to create its own browser or they could always be limited to 25% (or less) search traffic share.

    I think that having a "Browser account" which synchronizes browser bookmarks and settings and history across all instances of Chrome for a given user, is one of the greatest improvements in browsers in the past 5 years, and all other browsers seem to be copying this idea. If google were the evil empire as you imply, it would be suing the pants off these other browsers, but it is not.

ntakasaki 11 years ago

>In 2011, Google famously accused Microsoft’s Bing search engine of doing exactly that: logging Google search traffic in Microsoft’s own Internet Explorer browser in order to improve the quality of Bing results.

MS didn't do that from IE, they did for users who installed the Bing bar, a huge difference.

Metapilot 11 years ago

I think the author's perspective is skewed in order to stay in line with the title. Here's an example of why I say that:

The author states that "For some 90% of searches, a modern search engine analyzes and learns from past queries, rather than searching the Web itself, to deliver the most relevant results." This may be true in some types of searches but overall, I think the statement is misleading.

Rather, it's better to think of it like this: One important part of the algorithmic process involves constantly crawling the web and updating the index with new information. (Important / frequently-updated web sites may get crawled all day every day, while ones that are less important may get crawled only weekly or monthly). Meanwhile, another part of the algorithmic process constantly analyzes new info discovered in the crawl and combines it with, as the author-mentioned, click-through data learned from past queries.

The answers to many queries don't change, while the answers to many other queries deserve freshness. For example, I'm quite certain Einstein's date of birth hasn't changed in quite a while, but his theory of relativity is in constant discussion and there is always new information and new queries pertaining to it. As a result, there is not much need for a search engine to go digging for the latest info on an "einstein's birthday" query, but it's to everyone's advantage that Google is able to identify which pages on the web deserve priority crawling and that Google has retrieved and incorporated the fresh info those pages contain into its index when it comes to a topical type of query like "diffraction of light with quantum physics".

In the end, the results to every query depend on info gathered from the web and user data helps refine the results. Info that is more static can be prioritized with more input from click-through data, while new information found on the web must rely more on Google's artificial intelligence to push it up in front of searchers.

Another reason that that "90%" statement sticks out to me is that there is a fairly often-used factoid tossed around industry experts that between "6% to 20% of queries that get asked every day have never been asked before." Google can't rely heavily on past query data for all of these type of searches.

  • solve 11 years ago

    You're vastly underestimating the uniqueness of search queries these days. Various sources within Google have said that 25% to 50% of queries entered into Google have never been seen before at all.

wmf 11 years ago

So does Mozilla's contract with Yahoo allow Mozilla to track query data and maybe feed it to underdog search engines like DDG or Blekko (oops)?

  • minthd 11 years ago

    AFAIK ,the deal with yahoo was about putting yahoo search in the front. If it was about tracking Google search data - mozilla should have at least let people known, especially with their claim at protecting privacy. And if they lie ,they risk a very strong response, especially from developers they depend on.

    Also ,if such changes we're to be made, there's a decent likelihood that someone would have noticed that data leakage and told us about it.

    So since mozilla is a pretty decent company, we should currently give them the benefit of the doubt.

    • rockdoe 11 years ago

      I don't see any reason to doubt anything or for that matter give anyone "the benefit".

      Firefox is still open source, unlike IE, Safari and Chrome, so just look.

      • minthd 11 years ago

        Yes you're right. They probably couldn't do those games even if they wanted.

  • rockdoe 11 years ago

    I think the point is more that the contract with Yahoo in the USA doesn't prevent Mozilla from making deals with smaller[1] players elsewhere (which they've been doing).

    [1] Smaller than Google. The search box isn't given away for free.

ekr 11 years ago

So that's why Google created the Chrome browser.

  • ntakasaki 11 years ago

    Not just created, but bundled and installed with default by Java and Flash updates some of which also install the Google toolbar into IE. Many folks that I had converted to Firefox from IE back in the day use Chrome now and have no idea how it ended up on their computer. This explains the steady rise of Chrome, not the few percentage of tech geeks that installed it by choice.

minthd 11 years ago

So, since Google tracks the full browsing experience of chrome users, and hence gets more relevant data than for other browsers users, it has the theoretical ability to offer better search results to chrome users.

Has anybody noticed this happening ?

  • asuffield 11 years ago

    (Tedious disclaimer: my opinion, not my employers. Not representing anybody else. I work at Google, not on chrome)

    Google does not "track the full browsing experience of chrome users". Please read the privacy policy which is very clear on this subject: https://www.google.com/chrome/browser/privacy/

    I particularly draw your attention to this paragraph: "If you use Chrome to access other Google services, such as using the search engine on the Google homepage or checking Gmail, the fact that you are using Chrome does not cause Google to receive any special or additional personally identifying information about you."

    • minthd 11 years ago

      Maybe i'm reading this wrong, but this sounds like Google gets your browsing history:

      "If you sign in to Chrome browser, Chrome OS or an Android device that includes Chrome as a preinstalled application with your Google Account, this will enable the synchronization feature. Google will store certain information, such as HISTORY, bookmarked URLs as well as an image and a sample of text from the bookmarked page, passwords and other settings, on Google's servers "

      And this isn't that far from full browsing behavior.And that's from a few minutes reading this page - we don't know if they track deeper details - like how long the page was open.

      Also - Google doesn't have to collect this data. The claimed purpose of this is that you could share history on multiple devices. But this can also be achieved by sending encrypted history to Google and decrypting the history on each device you use(i think browser extensions with similar functions implement this in that way). So it's clear the purpose here is collecting data.

      • snowwrestler 11 years ago

        How is this feature any different from the Firefox, IE, and Safari browser sync services?

        The fact that Google synchronizes data between browsers does not necessarily mean their search application has access to it as a ranking signal.

      • asuffield 11 years ago

        Notice that this is a feature you have to turn on (try it!). Obviously in order to perform cross-device synchronisation, it's necessary to send this information.

        I'm not free to discuss the details of how these systems work, but consider this: if both the statements "Google will store this information" and "the fact that you are using Chrome does not cause Google to receive any special or additional personally identifying information about you" are hard requirements, how would you implement this feature?

        • minthd 11 years ago

          Of course "the fact that you are using Chrome does not cause Google to receive any special or additional personally identifying information about you." could be true.

          But Agreeing to sign to the history sync feature is something different , not covered by "using Google Chrome".So now Google is free to use your history.

          And let's be realistic here. Most people don't think about the implications of login into Google(even if it says sync of bookmarks , history etc) and probably don't read the instructions. Many even don't understand what it means. And realistically most people see a Google login box which they filled a million times, and fill it once more, as a sort of a pavlovian response.

        • minthd 11 years ago

          >> Notice that this is a feature you have to turn on (try it!)

          This isn't true , according to this post in google product forums(and others like it who complain about unwanted sharing and how stop it) :

          https://productforums.google.com/forum/#!topic/chrome/hOk8r9...

          And personally i've tried it , and that's false - the default is history sharing.

          I don't want to be rude, but are you some kind of a troll, or just enjoying spreading lies ?

          • asuffield 11 years ago

            The answer given there is simply not correct. Click: settings, advanced sync settings, uncheck "history".

            I am quite confident that the default behaviour when you install chrome is that you are not signed in. It definitely doesn't share any history when you're not signed in. It wouldn't make a lot of sense to have a login box otherwise ;)

            I also feel that the message you get when you decide whether or not to sign in to chrome makes the purpose of this feature quite clear: "Sign in to get your bookmarks, history and settings on all your devices." Here's the detailed page it links to: https://www.google.com/intl/en_uk/chrome/browser/signin.html

            I'm not sure what behaviour you were expecting, can you clarify?

            • killwhitey 11 years ago

              Fresh install of Chrome. https://i.imgur.com/4ZyuSNN.png Notice how little "No thanks" and "Choose what to sync are". Nice anti-pattern.

              And if you simply sign in and don't choose anything, this is what you get https://i.imgur.com/XQSxwu5.png

              • asuffield 11 years ago

                That all seems clear and reasonable to me. I double checked the size of 'No thanks' and 'Choose what to sync' in the chrome dev tools: they are 13px, which is exactly the same size as 'Sign in' and 'Need help'. Signing in without making a specific selection gives you what the third line on the screen said it would do.

                What, if anything, is wrong with this?

                • anon1385 11 years ago

                  Good grief. Are you seriously comparing the text in a huge, boldly coloured and centred button with an undecorated link outside of the main UI element and claiming they have equal weight on the page just because the basic text height is the same?

                  When people talk about employees at large tech companies being in a disconnected bubble of delusion, this is the kind of thing they must mean.

                  • asuffield 11 years ago

                    The grandparent said it was smaller, and I was merely observing that it is not. If you want to talk about colours, the colours used in both locations are the same: one is white on blue, the other is blue on white. I suppose you can claim that one is closer to the centre than the other, but this feels like splitting hairs.

                    When I look at this page, these buttons both seem quite readable to me, and I don't feel that either of them is concealed or hard to press. I do not believe that this page is deceptive or misleading in any way. It clearly and directly tells you what it does. And let's not forget that it's not doing a bad thing, it's offering people a feature which appears to be popular. If you're particularly concerned about adding another layer of privacy defence on top of the already fairly formidable ones that you get by default, you can add your own passphrase to encrypt the information locally on each device, and you can examine the source in chromium if you want to be sure you know what it does.

                    If you're complaining that the people who designed the feature are trying to encourage people to use it, then frankly I think your complaint is unreasonable.

                • sanxiyn 11 years ago

                  You know it is wrong, because it misleads users. UI is not about technical accuracy, it is about what users do.

                  • asuffield 11 years ago

                    How do you believe this is misleading users? It seems quite clear to me, and while this is anecdotal rather than data, based on the people I know the feature seems to be quite popular.

        • rockdoe 11 years ago

          Notice that this is a feature you have to turn on (try it!).

          Can you use an Android phone without this? AFAIK that's essentially impossible without very deep technical knowledge.

          • asuffield 11 years ago

            Yes, you can. If you use a clean Nexus build then last time I tried it, the behaviour was that the first time I opened Chrome it asked me if I wanted to sign in to chrome to share my history and bookmarks between devices, and I had a simple yes/no choice. I don't think this needed any deep technical knowledge.

            I'm not intimately familiar with the android chrome settings UI, so I apologise if I've missed an easier way, but in about 30 seconds of tapping I found the button to turn it off: main settings page, accounts, my google account, sync, uncheck chrome.

    • ksk 11 years ago

      The Google chrome omnibox is a keylogger that transmits all requests to Google. And the "prediction/auto suggest" and "phishing and malware" features allow Google to track all URLs visited by Chrome users. The "usage statistics" feature collects even more data. Normally, for a company that ships internet software, many of these features would be considered harmless, even beneficial. But for advertising companies like Google, who make their money by vacuuming up users data and allowing other companies to bid on their profiles, I think its almost impossible to trust you guys.

      Also, FYI, the privacy policy is irrelevant. It only stats what a company might or might not do. The Terms of Service is the real deal - legally speaking - which of course Google can and does modify at their will, anytime they want to collect more data.

      ----------

      "When you upload, submit, store, send or receive content to or through our Servicesǂ, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content. "

      "This license continues even if you stop using our Services (for example, for a business listing you have added to Google Maps). "

      http://www.google.com/intl/en/policies/terms/

      https://www.google.com/chrome/browser/privacy/eula_text.html

      ǂYour use of Google’s products, software, services and web sites (referred to collectively as the “Services”

    • rockdoe 11 years ago

      That's specifically limited to Google services. It says nothing whatsoever about the rest of the internet!

      Also note that for Google services there's no point in collecting that data because Google already has it.

tokai 11 years ago

Training data is nice, but I think its important not to underestimate capacity for crawling. IMO one of Googles strengths is that they crawl large quantities of new content. Smaller operations like DDG can't crawl at that scale. If I want discussion new bugs, search the articles at my favorite newspage (where the inhouse search is unusable), or just want the newest blogpost on some subject - Google is hard to beat.

PaulHoule 11 years ago

At this point Google is not winning because it's search results are good (have you used Google recently?), it is winning because it makes almost 10x as much revenue as other search engines do per view -- at that rate any other search engine is running a charity.

  • minthd 11 years ago

    It's really pretty weird. Google certainly has the capabilities to offer a great search experience, but it's very incosistent.

    For example after learning i like the results of a certain journals ,their personalization engine offered me those in releated searches. and usually i chose content from them.

    But somehow, after some time, Google's personalization engine forgot that i like them ,and stopped offering me content from them, so i'm back into drowning in shitty results. Why ? no idea why.

  • vixen99 11 years ago

    its

countrybama24 11 years ago

Seems like there is a business opportunity to build a plugin of sorts that allows users to opt in and share their search data with competing platforms. I'd be interested in donating my data to help a rival engine compete with Google.

thallukrish 11 years ago

Only when user can own his data which means Apps are just logics and user can allow access to whomever whatever selectively we can suddenly find more genuine things reaching the user be it commerce or content.

thrownaway2424 11 years ago

It is unsettling to read this kind of chip-on-my-shoulder opinion piece full of innuendo under the Firefox logo and the Mozilla name but on the author's personal domain.

Semiapies 11 years ago

TL;DR - Yahoo! still exists and resents Google. But not for being better in their niche, no. Just for delivering a better service, which is not at all the same thing. Somehow.

asuffield 11 years ago

(Tedious disclaimer: my opinion, not my employers. Not representing anybody else. I work at Google, not on search quality)

This article makes a number of bold claims about the contents of data and code which its author hasn't seen, and is written by a company that is receiving a large amount of money from Yahoo. I would encourage people not to forget these details.

  • rockdoe 11 years ago

    So just point out where it's wrong, instead of making a fairly disingenuous appeal to non-authority or however you want to call it?

    • asuffield 11 years ago

      I don't speak for the company, but I don't think we're going to respond to an attack piece by the Mozilla CTO by disclosing how our search algorithm works. ;)

      In any event I'm not a person who can decide to release that information. All I can do here is to ask people to think about what evidence has been offered and the motives behind this article.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection