Why does searching Google for random hex lead to car dealers? [video]
tmp.tonybox.netLooking at this very briefly, the results seem to always be inventory pages for the dealerships, which use long strings of hex or just random numbers as identifiers for the vehicles they have for sale.
For example, a search for "ca7112b7167c15e621412c0fbc0a6c97" brings up the URL "https://www.premierclearancecenterofstbernard.com/inventory/...", which has a gallery of vehicles at the bottom whose image names are of the format "9b362510c100095f02cf3cad9e365ea6.jpg".
I assume something inside the Google black box is saying "well, there's no exact match but this site has a bunch of strings with most of the same characters, so here you go".
Edit: And to add to this, I'd surmise that the reason you see a lot of car dealerships in these results is that they sell a lot of one-offs - instead of having a list of SKUs in inventory, they sell a unique vehicle just once, so the inventory systems need to account for that by using long strings as item IDs and the like. Also there's probably a limited number of inventory systems out there, so a bunch of random dealerships are probably all using the same one.
Back when Google search was good this query would have returned no results. As it should do. Now it desperately tries to dig up anything it can find just so the number of results is not zero. Somebody at Google wanted to the increase search 'hit rate' KPI and this is the result.
If you put quotes around the string (the "exact match" operator), the only results are this very thread. So it seems to be working as intended.
Basically, you did a fuzzy search and got a fuzzy result. Usually that's what people want. Quotes will let you fine-tune results. Or if you want all results to be strict by default, use verbatim mode. I tested that with the above string and again, only this thread showed up.
But it’s clearly not what people want. Ask any person if a search for a hex encoded ID should be a fuzzy match for a different ID and the answer will be no.
As technical people, it’s easy to infer what’s happening under the hood and make excuses for the weirdness. But food product design is about having strong opinions about what should happen, and ignoring our bias is around the limitations of the tech or the status quo.
In an age where I can have an entire conversation with a computer or generate a video from text the world’s greatest search engine still doesn’t understand that you can’t fuzzy match an ID? It increasingly feels like Google search is stuck in the past.
Who is this "any person" who's searching for random hex, and how much do you think they care of Google shows them a car instead of whatever thing they're not even actually looking for?
the idea that this mythical "any person" even cares about the difference between a useless car result and a page that says no results and then they just move on with their lives is projecting a lot of your own biases onto a hypothetical.
Obviously no one would search a completely random hex, but it may represent an ID somewhere, and they want to find out more information about it. e.g. a SHA or MD5 hash.
Agreed, you can look up d41d8cd98f00b204e9800998ecf8427e for example. The question is how critical is it that Google returns a non-useful result vs a page that says there are no results. I think most people don't care, it's not useful either way.
As an aside, for those curious, it's the MD5 of the empty string :)
I search id's all the time. Google is becoming more and more worthless.
> But it’s clearly not what people want. Ask any person if a search for a hex encoded ID should be a fuzzy match for a different ID and the answer will be no. >
in a search field explicitly for hex encoded IDs it shouldn't be
In a generic web search that has to guess if my term was a hex encoded id ('cafe' is but almost certainly isn't intended as one..).. it's less obvious.
in the case of a clear hex encoded id of sufficient length, i would like to know there are zero exact results, but as long as it's still fast I would love some fuzzy matches after in case there was a typo in my term or in the indexed document.
>the world’s greatest search engine still doesn’t understand that you can’t fuzzy match an ID?
No, not without telling it to run an Exact Match search by enclosing the string in quotes.
Meanwhile if I search for a specific Bosch solenoid part number there's a 50% chance that any one result will point to some different part number that contains 90% of the digits - even though the specific part number actually exists!
Same for electronic part numbers. Search engines will just go "eh, pretty close" and mix in results for, say, TPS562201 with those for TPS56221.
I get that that's the default now, but can't help but hate it. When you search for like `dog house` to have a bunch of results for just house (marked "Missing: ~dog~" ) it's so dumb. Why would I have typed dog unless that was important to me??
This sets things up for all sorts of problems when people don't notice that the IDs aren't exactly the same.
At the moment, Google happens to be choosing car dealers as a fallback, but what if it instead fell back to a page "transaction a67cedf has been confirmed"?
Garbage in garbage out is fine here, no? I hate google quite as much as the next person here, but this seems like a non-issue. If I type in a random string, it should be assumed that I'm searching for something.
Sometimes you really do want exactly that "random" string. This is common with error messages, model numbers, build hashes, etc. If I'm searching for B9GDSIGH as the model number for my refrigerator, I really don't want to see B9GDSIGY.
But if it links to the B9GDSIG series refrigerator, which has the 240v H and 120v Y subtypes, then it would be correct in suggesting that?
Same with error messages - they often have timestamps, or local object IDs/memory addresses, which you also want to be fuzzy-matched.
I think the issue is the de-emphasis of "power" modifiers for google - it's less obvious how to say "This part of the string needs exact match, this can be fuzzy"
In that case, click the "must contain" link and it resubmits with the query wrapped in quotes. Or, just quote the query yourself on the first go if you know it must match
Google no longer (hasn't in a while) respected quotes. It's very hard to get Google to actually say there aren't any results even when in fact there are no matching results.
They respect it when they submit it then, as every time I've used that function to see them update the query with quotes it comes back with different results. I've never cared to look at the search query in the URL, so maybe they also add and additional parameter that tells the back end specifically to obey the quotes on this resubmitting???? So at some point, the quotes aren't ignored
that's not my experience.
https://www.google.com/search?q=%22kgirbudidndijrjjr%22 gives me "Your search - "kgirbudidndijrjjr" - did not match any documents.", at least it will until they index this comment and find kgirbudidndijrjjr
Quotes are more like guidelines these days.
on the advanced search, there's still the option to specify that it 'must contain' something, but I'm not sure if it's just a suggestion like quotes or not.
I "love" how we've reached a point where we so distrust this company specifically but dark pattern UIs in general where we almost anticipate placebo like buttons.
One man's trash is another man's treasure. Search is ambiguous enough by nature IMO. No liberty zone!
Agree with the peer - specificity matters. Model numbers are a good example. I feel like I've developed a weak form of dyslexia because I can't trust Google like I once did.
Things I want fuzzy searches for... will be presented fuzzy. Not as an opaque string of usually-quoted characters, but wrapped in keywords
A reply makes a good point - double quotes don't seem as effective any more.
I miss when Google had thousands of results, and you could browse past page 5. Now it just lies to you.
Is there anyway we can somehow find out that is true?
I could have sworn google always was happy to return some odd url matches, typically when the given results weren't great.
I remember when Googlewhacks[0] used to be a thing. Zero result search queries weren't interesting enough because they were too easy to find.
I've seen it come back with something along the lines of "it looks like there's not a lot matches" with some useless cartoon graphic.
I see this a lot when searching for phone numbers. I've also seen the opposite like the forced "find something no matter how terrible of a match to avoid no results" as being described. You search for a number and no exact matches, but it returns things with different area codes same prefix different numbers. Or same area code, different prefix, same numbers. Or some such randomness that I can't even venture a guess as to why it thought the not one number matches would be interesting to me. Unless you're brave, I'd suggest not searching for random phone numbers with Safe Search off as you'll find some very interesting pages displayed that have absolutely nothing to do with the number being searched.
There was at one time a kind of game where you tried to find a search term that would return only say 3 results. It was hard, but some did get found.
Having said that I have recently had some kind of "nothing found" result on several occasions. So it still happens.
--edit--
In fact I just tried "ca7112b7167c15e621412c0fbc0a6c9" (omitting the last digit to avoid HN) and got:
Your search - "ca7112b7167c15e621412c0fbc0a6c9" - did not match any documents.
Suggestions:
Make sure that all words are spelled correctly. Try different keywords. Try more general keywords.
Google-Whack as I knew it.
Where you tried to find only a search with one result.
Unless you have a time machine there's only anecdotal evidence, but there's plenty of it on HN. Seen many comments here reporting the same thing.
Just do an image search for "google search returned no results screenshot". Plenty of examples.
I can't tell you the number of times I've searched for random serial numbers and gotten the exact product I seek. I'm glad Google indexes this random crap.
An experiment would be to create high quality, non-commercial websites with pages containing these hex strings and see if the pages appear in Google SERPs.
The fact that Google returns car dealerships when the user is searching for hex strings is telling.
That doesn't sound right to me: Google used to suppress results with string matches?
Why?
If so, would that be a good thing?
Why shouldn't I be able to find the vehicle via its ID?
These aren't string matches. Check again.
Ah, doh, thank you.
> no exact match but this site has a bunch of strings with most of the same characters
I suspect it's something similar, but more like partial string match which may score as "close enough to display". I get consistent results with the same hex string - dealerships - but if I quote it (exact match), I get no matches.
I suspect there's a single word embedding for WTF_IS_THAT.
I DO NOT BUY IT. Plenty of sites use unique identifiers and other random hex strings all over, e.g., fingerprinted assets. If your explanation were accurate I would expect more kinds of sites to show up
Additionally, the user is doing the search in a non-Incognito session, so the system will bias based on assumption of user preferences. "Hm, I see this random hex identifier in three pages... Oh, but this user likes cars. Let's give 'em the car result first."
> Edit: And to add to this, I'd surmise that the reason you see a lot of car dealerships in these results is that they sell a lot of one-offs - instead of having a list of SKUs in inventory, they sell a unique vehicle just once, so the inventory systems need to account for that by using long strings as item IDs and the like.
If only there were some sort of standardized identification number for vehicles
Bing search results for that are interesting
Repro'd in an incognito window so it's not a history thing. 1st 3 of OPs strings if anyone else is experimenting (remove spaces):
3344cfb4 78ead204a49b88 1da6079adf8a
e2c75c64 eef8087f6f36df 57
eb944335 73626fe9b73550 b02a651620d8
--Shoot, depending on crawling, this may end up causing this page to match. I'm injecting spaces above to deter this, but maybe it'll also prove out the partial string match theory...
I'm only getting back 2 results: Citi.com and FDIC.gov
Clicking on the 3 dots gives me this info:
Your search & this result This result seems relevant even though this search term may not appear: 3344cfb478ead204a49b881da6079adf8a
Most likely some part of the string matches the VIN number. Dealers are legally required to post the VIN of an actual vehicle in any advertisements that have a price, as a way of preventing bait-and-switch.
Funny, in Europe that's absolutely not the case.
I watched some government sale and they posted a PDF vehicles for sale that were forfeited.
The VINs where there but parts of it where blacked out.
It was a PDF. I copy-pasted the text behind the black box and got the full VIN.
In Europe VIN's of cars are treated a little like SSN's are treated in the US. Some governments assume that just because you know the VIN of a vehicle, you must be it's owner, despite many vehicles having the VIN written on every bit of glass and visible without even unlocking the car...
In the US, the VIN must be visible from the outside of the vehicle, through the windshield on the driver's side. Covering the VIN from view is illegal.
> It was a PDF. I copy-pasted the text behind the black box and got the full VIN.
You're such a hacker. As the world turns now, I'd expect some legislation that says if you copy the text from a badly created PDF, then you are the one to blame and not the one that made the bad document. You're clearly circumventing the intent. You you...criminal.
And yet they still bait and switch. Most recently-ish with added markups not in their online price.
Or just claiming the vehicle is currently unavailable or not yet for sale because it's in the shop/in use as a loaner/the manager has a hold on it or some BS, but here's a very similar vehicle that we'd love to unload on you!
It's very technically legal because they do have the vehicle in their inventory, and you can test drive and buy it, but just not right then.
Good guesses in the comments so far: VIN number partial matches and targeted search. Anyone going to test what's correct?
Ideas: 1. Vin numbers are 17 characters and don't contain I, O or Q, to prevent confusion with other letters. If you throw in lots of these always spaced by less than 17 characters, do you get fewer hits?
2. Does a VPN and/or private browsing affect the results?
A third possibility is that Google has cheaper ad category for search queries that they can't categorize. This doesn't explain the diversity of dealerships though.
Sure, it's matching VINs. But in the vast expanse of the net, surely there are many strings of random hex out there. Why this source of random digits.
Probably the most authoritive sites with weird strings
"most authoritive", for financially preferred values of authority.
Mercedes uses 18-digit vins, tho I believe it’s the same format and checksum algorithm for the first 17 digits so it’s really more “17 + 1 digits”. Still drives validators nuts tho.
The word embeddings computed from the hex values and the car dealership's inventory ID's probably have close similarity in Google's vector db.
I like that theory, but with one slight modification.
There's a single word embedding for DARNED_IF_I_KNOW, and, statistically, automobile listings outnumber other pages with the DARNED_IF_I_KNOW token.
Weird premise. I search for random hex literally all the time (checking hashes and guessing algorithms as a part of my reverse engineering work) and I don't remember car dealers coming up especially often. I suspect it's just the author who - because of their location or the previous search history - gets more targetted car dealership ads.
But the results here aren't ads - at least they appear to be regular search results.
I think this is notable just because it's a result of Google now having every single search result set be trying to sell you something. That's different from simply having targeted ads and rather disturbing.
Google is now a glorified Yellow Pages, assuming that every search is a search for a business.
I see that digits is between 10 and 19:
DIGITS=$((10 + $RANDOM % 10))
If it was always an even number, I would have expected some checksum files to be matched (16 for md5sum, 20 for sha1sum, etc).I'm going to guess that google makes more money from car dealer ads than it does for programmers searching for hex codes. Also probably just because Google's search is more and more giving irrelevant results.
I tend to get variations on cryptocurrency block explorer websites mostly.
It's annoying when I want to search for a btih or something exact.
Weird... Maybe Google thinks that the closest inexact match is a VIN number?
Search bubble?
Perhaps Google trolls anyone in security or torrenting, and would instead prefer to show CPM/CPC ads to charge instead of nothing because money. /s