Well-intentioned websites get caught in Google’s algorithmic penalty box

56 points by kyle6884 11 years ago · 30 comments

Reader

The client site, http://www.autoaccessoriesgarage.com, is engaging in cloaking.

Go to http://www.autoaccessoriesgarage.com/Seat-Covers/

Use the picker to pick a particular make and model: http://www.autoaccessoriesgarage.com/Seat-Covers/_Acura-RDX?...

So far so good, no problem. Your browser now has a cookie that says you're interested in just this make and model. Now for the problem: use the nav links to go to "Cargo trunk liners", and where do you land?

http://www.autoaccessoriesgarage.com/Cargo-Trunk-Liners

That's cloaking -- it's not showing you all of the liners, just the ones relevant to the make and model you picked earlier. Instead, the site should add _Acura-RDX?year=2008 to the url, just like before.

Why do search engines care about this stuff? Now imagine you type in [auto accessories cargo trunk liners] into your favorite search engine, and the result is http://www.autoaccessoriesgarage.com/Cargo-Trunk-Liners ... what does the search engine think you'll see? It has no idea, really.

TomAnthony 11 years ago

Google disagree with your assessment:
https://productforums.google.com/forum/#!searchin/en/cookie$... (see the response is marked best answer by Matt Cutts - head of web spam at Google).
If I have never been to the site I'd land on the unfiltered page that would be a good result, and if I had a cookie (which seems to be a session cookie from a quick look) then it is likely I was recently at the site and so the filters are likely relevant but if not they are easy to change.
'Cloaking' has negative connotations and is more of a concern when there is an attempt to mislead search engine. In this instance, there is a big problem with your suggested fix -- the Panda algorithm would see many very similar pages which might actually make things worse (which I agree is silly, as your solution would otherwise have some upsides, but there is often a trade off in these situations).
- greglindahl 11 years ago
  
  That's a simplistic way of thinking about the problem -- as a search engine professional (not SEO), I'd never recommend something that depends on GoogleBot figuring out that I'm not really cloaking.
  The duplicate content problem you describe is fixable (edit: and is already a problem, I'm only recommending changing links, not adding any pages to the site.)
  And by the way, there are plenty of websites that force crawlers to use cookies in order to crawl the site. I don't know how GoogleBot deals with that, but I bet it involves crawling with cookies... no matter what the forum post says.
  - TomAnthony 11 years ago
    
    Yeah - I don't disagree that there isn't possibly some level of risk. But if your concern is "GoogleBot figuring out that I'm not really cloaking" based on the presence of cookies then I'd challenge (what I think is) your implication that having cookies on your site means Googlebot might suspect you of crawling.
    As to Googlebot's use of cookies - there is debate and folklore, but in the tests I have run I have not seen Googlebot ever send back a cookie that I have sent it.
    Google do manual reviews of pages, and I am confident the site in this example (for the case in question, at least) would pass that without a problem.
    I'm (genuinely) interested in your proposed solution for dealing with the duplicate content problem. The problem with the Panda algorithm is tends to be a bit touchy and it seems easy to fall foul of it even with innocent situations like this one.
    
    greglindahl 11 years ago
    
    That's not my implication, nor what I said! I said that this website should choose a link method which is unambiguiously not cloaking. Then there's no chance that you'll confuse search engine bots.
    The duplicate content issue is not in play for my suggestion; as my edit above states, I'm only recommending changing links, not creating any new urls.
- kyle6884OP 11 years ago
  
  Exactly +1
ignostic 11 years ago

That's sort of a problem. Also, look at these pages:
http://www.autoaccessoriesgarage.com/Seat-Covers/_Nissan-Alt... http://www.autoaccessoriesgarage.com/Seat-Covers/_Hyundai-So...
The page looks identical, and if you think you're fooling Google into thinking these pages are very different with a couple keyword-stuffed paragraphs of text, think again. Now open both in separate windows and click a product. The product page itself doesn't change except for the vehicle name inserted with a cookie. This looks like you're just mass-generating category and product pages dynamically, which is probably what you're doing.
Don't get me wrong, I feel your pain, and funny enough I've solved this EXACT SAME problem on a similar car accessory site. Maybe I can offer some advice.
Your main Panda problem is that you have a page for every type of product for every make and model. That's a LOT of nearly-identical pages. You need to consolidate them somehow. Easier said than done, right? You don't sell all products for all vehicles, and you want users to have an organic landing page when they search for something like "[make] [model] [accessory].
Instead of generating these landing pages and making up text, I'd use a filter on your car covers page that sticks the user with a URL variable that stays with them until they change their make/model. This also frees you of the need to make up pointless mass-generated paragraphs.
This truly is frustrating, because the site is actually functioning in a way that makes sense for the user, and Google is penalizing them for it.
kyle6884OP 11 years ago

Excellent point Greg, the article does address this though: check the "U/X vs. Googlebot/X" section
- greglindahl 11 years ago
  
  Right, I've added in the right word (cloaking) and a much better fix. Adjusting cookie policy for crawlers is a bad idea.

johng 11 years ago

This is a great article and I wish it would get more attention. Very often a site gets penalized without any rhyme or reason and the end user has no way to find out why or how. It's just darkness, and Google doesn't care.

Kalium 11 years ago

I submit that they do care, but that the costs associated with telling people how the penalty system works are higher than the benefits of telling people how the penalty system works.
- cgingrich 11 years ago
  
  I agree there Kalium...Google is a company of scale and they think in broad strokes (part of why they are so successful). I don't think I'm asking them to tell us everything about how it works but they managed to create a messaging system with Penguin penalties so what is it about Panda that makes that a harder task? I'm definitely more attached to this than most so I know it might be an anomaly but as someone who is encouraging clients and sites to create the type of content Google wants and values, it's hard to see a site trying to do it right, potentially making mis-steps and being completely lost on what to do next.
  - Kalium 11 years ago
    
    Perhaps they learned from previous messaging that it helps spammers more than it helps people like you.
cgingrich 11 years ago

Thanks John! Appreciate the support...I'm not saying we didn't miss anything but after banging our heads against a wall for 9 months, at least a small confirmation of where we stand would help. These are real people making tough decisions about their business because of it and it's not the case of them spamming or trying to manipulate Google in this instance.
- Kalium 11 years ago
  
  Bear in mind that what you're asking for is in every detail exactly what the spammers and manipulators want to know.
  - cgingrich 11 years ago
    
    Definitely not every detail. Even a simple - "Penalty for duplicate content," "Penalty for thin content," or "Penalty for low quality content," would provide some context. Was it manual or automatic? These are all "best practices" that most spammers know but since Panda is a broad algorithm we have no context on which portion is hurting us. And I would say most Panda clients we've worked with have unintentionally been caught up by duplicate content, etc.

bhartzer 11 years ago

I think the problem is that the "Panda" algorithm seems to essentially have the same criteria for all sites. Depending on the topic of the site (i.e., ecommerce site selling products or a ticket site selling tickets), lots of pages tend to have the same content. Some product pages have the same product in a different color, or the ticket site is selling the same tickets some other site on the web is selling.

I get that Panda can be helpful, help identify "low quality" content. But the true definition of "low quality" changes depending the industry and the category of products being sold.

A good algorithm should be able to distinguish between the various sites or topics of sites, and apply said algorithm differently, right?

jaredmck 11 years ago

I would agree - however, it seems like Google is in a position where their algorithm doesn't necessarily have to be good, just good enough. And good enough basically means they don't torch any high-profile sites that would begin to make the public question whether or not Google is still the best search engine. Unfortunately, this means that many smaller businesses might get thrown out with the bathwater :(
cgingrich 11 years ago

For sure - I've seen eCommerce sites intentionally bloat category pages with tons of written content for the sake of not appearing thin. That's definitely not the experience that users want. I will say though that I've tended to see a little bit more of objectivity specifically on eComm sites with the algorithm being more forgiving on sheer volume of content (though not in this case).
It's definitely hard and algorithms will get better with time as Google understands more and more of the web, but in the meantime, give us a heads up so we can fix it :)
- bhartzer 11 years ago
  
  Intentionally bloating category pages with tons of written content, unfortunately, is the result of Panda. Sure it's not good for users, but that's what happened, simply because those ecommerce sites had to adapt if they were going to compete or have a leg up over their competition.
  And I'm not sure if Google took that into consideration when they launched Panda. If they had, we wouldn't be seeing the intentional bloat.
  - bhartzer 11 years ago
    
    The same exact thing happened when Google launched Penguin. I bet Google didn't realize that they would start a whole new industry, the shady industry of making site owners pay money to get low quality links removed. But I digress...

josephjrobison 11 years ago

@cgingrich - Perhaps you guys are only consulting on this penalty, but found it strange that for all the focus on SEO basics, on the home page there's no H1 tag (maybe seen as deprecated and not important by your team) and the main slider is all text hidden in the image, no plain text for the crawler to crawl!

But good write up overall, love these case studies.

Animats 11 years ago

Google has, in the sense that quantitative finance people use the term, "burned" their data. That is, they're using statistical methods to extract signal from noise, and they've done this so much that they're nearing the noise threshold. When a data set is over-analyzed in this way, the impact of irrelevant data items becomes excessive. That's what's happening here.

Search spam detection has improved over the years, but it's fundamentally aimed at detecting sites that "look like spam". In response, search engine optimization has become more about making clickbait sites look less like spam, even to humans. It's now hard to tell a clickbait journalism site, one filled by low-paid article rewriters, from one that has actual reporters. (Business Insider is owned by the founder of DoubleClick.) Looking at the superficial properties of a site is no longer a reliable spam indicator.

The big search indicator used to be links. That's what "PageRank" was about. Links stopped working because most links to business sites now come from social media and blogs, and those are really easy to spam. Anyone who runs a blog now can watch the phony signups and posts come in. There's a whole industry selling phony Google and Facebook accounts for SEO purposes. Google has responded by disallowing many sources of links, with the result that the remaining link data is sparse for many sites.

Google isn't looking at the business behind the web site. Here, Auto Accessories Garage sells auto parts. Find the business behind their web site, and you can verify that they are in the auto parts business. Their site is full of auto parts. Therefore, not spam. Google doesn't do that. That's why they failed Auto Accessories Group.

At SiteTruth, we look at the business behind the web site. Here's what we're able to find out for Auto Accessories Garage.[1] This is the internal details page; users rarely look at this. We give them a good rating. We didn't, unfortunately, get a proper match to corporate records because their corporate name is Overstock Garage, Inc. (We don't have a full D/B/A business name database for dealing with such problems yet.) SiteTruth picked up the Better Business Bureau seal of approval on the site, cross-checked it with the BBB for validity, and noted the "A+" rating there. Not a spam site.

The process is completely transparent. The link below lets you see all the data SiteTruth looked at for Auto Accessories Garage. Because it's checking against hard data from external sources the site can't control, there's no need to be mysterious about how it works. There's a vast amount of data available on businesses. If you tap into Dun and Bradstreet (we can do this, but can't turn it on for public viewing by free users) you get in-depth financial data on companies. That allows real supplier evaluation, far beyond what Google can do.

The SiteTruth approach does a good job on real businesses that sell real stuff. There are objective measures for such businesses - revenue, years in business, BBB ratings, even credit data. Google doesn't use those, and Google fails real-world businesses because they don't.

If you want to try looking at SiteTruth ratings, try our browser add-on from "sitetruth.com". We put those ratings on search results from Google, Bing, Yahoo, DuckDuckGo, etc. Now on Firefox for Android, too. End self-promotion.

[1] http://www.sitetruth.com/fcgi/ratingdetails.fcgi?url=www.aut...

CPLX 11 years ago

How come your site thinks this company is the same thing as AutoZone? The original blog post above identifies it as an independent family owned retailer.
- Animats 11 years ago
  
  Those are possible matches. None of them matched on address, but they matched on partial business name, city, state, and ZIP. We didn't get a solid match because they use a different D/B/A name than their company name, and we don't have a US D/B/A name list. We're using a marketing-quality database of US businesses for free demo purposes. The high-quality database to get this consistently right costs about $800K a year with daily updates.

mahouse 11 years ago

Of course. Computer algorithms are not perfect.

cgingrich 11 years ago

No doubt....I think the main argument though is how can Google encourage and help out well-meaning site owners when the algorithm gets it wrong?
- matthucke 11 years ago
  
  Even just re-running it regularly (and letting site owners know that it's been re-run) would be helpful - as the issues pointed out in your article were largely fixed many months ago, the lack of any sort of movement is unfair and disheartening.

crxgames 11 years ago

I've come to the same exact conclusion in the same vertical. My sites were hit in almost the exact same way.

Settings

Well-intentioned websites get caught in Google’s algorithmic penalty box

Keyboard Shortcuts