A/B testing gets misused to juice metrics in the short term
zumsteg.net"As an experiment, I went through a list of holiday weekend sales, and opened all the sites. They all — all, 100% — interrupted my attempt to give them some money."
This is a good touchstone to use for "you've overoptimized your site, tone it back". I am also taken aback every time I'm on a site, I've got something in my shopping cart, I'm headed for the "check out" button, or I'm even on the checkout page, and some stupid interstitial pops up. Dude, I'm trying to enter my credit card information! Back off! Especially stupid for a "sign up for our newsletter" popup; we all know that unclicking the "yes, we can email you every 17 seconds from now until the heat death of the universe with valuable offers from 'our affiliates' which we define as 'anyone we share a species with'!" box on the checkout form is mandatory, and if we don't see it immediately we'd best go hunting for it. You've already default populating the checkbox to "yes" on this very screen, get out of the way!
Less unbelievably stupid, but related, is when I'm examining product X and just after I scroll down a bit to read more you pop up something related to... well... anything other than product X! I'm signalling interest in product X as hard as I can, and you've AB tested that this is a great time to jangle your keys over there instead? Your AB testing is stupid and can't possibly fail to be some stupid statistical fluke or other terrible error. What fisherman goes out on his boat, hooks a fish, and then rushes to throw another completely different lure out to the hooked fish and get them on that hook instead? This is another good touchstone for being "overoptimized".
Speaking of 'sign up to our newsletter', one of the latest dark patterns I've found that astounded me was adding a checkbox to the login form [0], where you'd normally expect the 'remember me' checkbox to be. You almost click it out of muscle memory if you don't read what it says.
[0] https://i.postimg.cc/HW89hs7r/Screenshot-2022-07-12-145957.p...
Email marketing in general blows my mind. Marketers typically have absolutely no respect for consent, and the costs are completely borne by the recipient. The whole industry depends on dark patterns, shady list sharing, and scraping your email to add it to their lists despite you having no relationship with them. I know it's not simple, and it's just my frustration speaking, but I don't understand how my mail host can't ban all Mailchimp et al IPs for me, or implement some standard such that it costs them a penny to send me an email.
beware that Meta/FB and Tiktok scripts are among those that siphon off email data even before a web form is submitted
https://arstechnica.com/information-technology/2022/05/some-...
This is a good reminder to keep your hosts file updated to block at least some of these sites' attempts to take your data.
> Email marketing in general blows my mind
My favorite thing is when the companies outsource the email marketing, so that it has absolutely zero relevance. I've been using the same online tax preparer for 10 years, and I've had exactly zero refunds, yet their emails during tax season always let me know that "my refund is waiting".
The point of email marketing is to hit the right targets. If they overspray and hit 100000 wrong targets.., oh well, they lost 3 cents. So this overspray isn’t a sign that they are dumb, but they are effective.
That's why the only email subscription service on my site is completely transparent and details exactly what we're storing, and it's impossible to click by accident. Everything I do for myself I try to build like a service I'd like to use myself. But the second managers or marketers are involved it's all out the window. I remember early in my career my boss had me add every email included in a TED booklet to their marketing email list. I told him that morality issues aside he could likely catch a fine for that, especially since the type of people listed in a TED booklet are likely more litigious than the average bear. Didn't care, wanted more eyes on the marketing.
> That's why the only email subscription service on my site is completely transparent
Sorry, but honestly speaking even checking your profile does not reveal what your web site is. Probably you are marketing person from your words, but that's not clear.
Hope this feedback helps!
That's intentional, I'm not trying to plug my website I was just talking about my approach in non-specific terms. I'm a dev.
I'm not a big fan of ipv6, but fan or not, I bet if all spammy mailchimp type provider IP ranges were confiscated and freed, we'd be in ipv4 land for another 20 years.
And as a second thought, the way China amd Russia are going, maybe we should just reclaim all their ipv4 addresses, and just give each country 1 IP, they can proxy through it on their end.
Then say goodbye to the internet and hello to a mesh of country-specific networks.
Finally, true decentralisation!
? That's my point, as this is China already, and Russia is not far behind.
We already have that. It’s just we are in denial as a society about it.
Globalization is over. The post Berlin wall fall consensus undermined and ruined.
The last to realize, loses.
The quickest way to achieve defeat is to convince yourself that you have already lost.
One address should be enough for everyone? Bill Gates just reached zen.
Mostly, it doesn’t even seem to matter whether you agree or not. Inevitably you end up receiving affiliate emails regardless.
This is why every company gets a different address for me. If junk starts coming in, that address is blocked, and I stop doing business with that company (if I haven't already).
This sometimes falls foul of spammers adding some random addresses of the form blahblah@mycatchall.domain.tld or <commonname>@mycatchall.domain.tld into their lists, but that hasn't happened often enough to be a problem. That it isn't much of a problem surprises me a little, given how much <commonname>@domain.tld (no sub-domain) addresses are used this way. I have considered trying the pattern somename@<sub-domain-per-company>.domain.tld as an alternative if that becomes a problem, but before implementing that I need to change my email setup (doing that anyway soon as running Zimbra's OSS version is going to get more difficult next year) and maybe my DNS server of choice (if wildcard MX records are an issue, I've not looked into that).
Sometimes I get funny looks for addresses like this, especially as I usually work the company/other name in there somewhere. I had one website refuse to accept an address based on their name, which was a rad flag and I backed away from going any further into dealing with that organisation.
You can just use [id]-[sha1 hmac]@domain.tld
The id could be anything, and the SHA1 HMAC takes 32 characters in base32 (which is an email-address-safe encoding). Then just configure your spamfilter to reject any address where the HMAC doesn’t check out.
Of course, the drawback is that you’ll need a computer to generate a new address… At which point you may as well store an explicit whitelist of valid addresses.
You can do that from a phone app though, or even a static webpage that loads on phones. It doesn't have to be 32 characters either, putting ~4 characters is probably secure enough, unless a ton of people start using this exact scheme.
This is a great idea, I might implement it. The difficult part is hooking the validation in the mail server I guess.
I typically do it with
<emanynapmoc>@mydomain.tld
Spelling the company's name backwards makes it easy to match to a company for use by my own spam filter without setting off their pattern detectors.
> If junk starts coming in, that address is blocked, and I stop doing business with that company (if I haven't already).
AND CALL them, if possible: “I’ve received marketing emails from your company recently, how is this possible, I’ve never signed up, yaddayadda… “
Generate some cost on their side.
Support calls will probably cost you more than the company unless you value your time very little - most companis don't post numbers answered by anyone paid highly enough for this to have an impact. Instead, call them out publicly or mail/call a someone in a leadership position if you can find contact information.
You don't have to use the company name in the email address, just use a unique email. If you start getting spam on a certain email address, search your email archives to see what company was associated with that address to link it back to them.
Yeah my mortgage company sent me a letter saying to opt out of affiliate marketing Emails or snail mail I have to send them a letter requesting it. This was 4 months after they bought my mortgage so the most the letter woulda done is stop them after selling my info for 4 months.
I had a coworker who sent those letters as a side hustle. He had a few different ones and would send the letters certified mail. Companies are very poor at compliance, and certain violations allow you to sue the company.
Everything in that except the unspoken end point where I end up with money sounds awful
"Remember me in your newsletter list" is the next one. Send me money!
Yes! I just noticed this for the first time yesterday and thought, "I hope this isn't another terrible trend in dark patterns."
Heh. I’ve stopped clicking remember me boxes because they never work.
Yeah, I booked a flight on WizzAir two days ago, and this felt like a low blow even from WizzAir.
Predicting the flight will be cancelled in 3.. 2.. 1.. But then the newsletter will haunt you for far far longer..
Not a burnt WizzAir customer at all! /s
Happened to my dad on Sunday, and the only replacement flight they would offer is for this Friday, what a complete joke air travel is in 2022.
I mean you're complaining about an ultra-low cost airline. why would you expect it to have a good customer experience?
Yes I am. They also overbooked seven people on that flight, delayed it for hours, and then completely cancelled. There were people in wheelchairs left stranded at the airport after waiting there all day. This is just plain incompetence.
atlassian does this as well: https://i.postimg.cc/zfwbG5Ft/atlassian-login.png
I knew Atlassian hates their users. But that much??
Thats one of the things I REALLY dislike about GoG lately. It tries really hard to bait you into signing up for the newsletter when buying stuff.
I never check "remember me" so maybe that's good for me?
A lot of this happens because different managers have different metrics/KPIs they are optimizing around and they all find "good places" to do things to help meet their goals. The secondary effects aren't considered. One managers quest for outperforming their goals comes at the expense of another managers goals.
There was a point a few years ago where you could not see a single piece of user-generated content above the fold on the reddit home page. A bunch of teams had jockeyed for having their little carousels and banners put on top, and of course, metrics were always cited.
I left a screenshot in slack and it ended up causing a couple of teams to have to roll back their widgets, but it always baffled me that we were able to focus so much on the individual trees of metric optimization that we would miss the forest to that extent.
> but it always baffled me that we were able to focus so much on the individual trees of metric optimization that we would miss the forest to that extent
Always look to the decision maker's incentives and you'll almost always discover why things are the way they are. And often, to your point, there's an aspect of tunnel vision associated with it because considering the bigger picture is difficult as a company grows and becomes more complex and creates friction in achieving goals.
Ultimately, this is the purpose of senior leadership. But the Peter Principal really begins to kick in at that level and the truth is, many senior leaders are in over their head and are unable to materialize the broader strategy and understand how their various units are affecting it. So we end up with crappy products.
Gervais principle says that senior leaders work for themselves, not the company, up to the point of working against the company.
That was an interesting rabbit hole I hadn’t been down before. Interesting!
The best metric is the end of year bonus, tided to yearly total company financial results, but that only gets measured once per year. I can measure many things on every transaction, but how they in total work out to my end of year bonus and paycheck are much harder to see.
Of course if my bonus is some small KPI I can optimize that at the expense of overall performance.
It seems like every company goes through some version of this. At Twitter there was a channel called #ios-six-bars or something like that that started when an engineer posted a screenshot of the home timeline with six bars of things on it, all from jockeying teams trying to grab a spot on that page: Home, Spaces, all the new features just had to be thrown in the face of somebody who probably just wanted to read some tweets. Discussions were had, product cohesion was brought up, then things went quiet for a bit. Until someone posted a new screenshot a couple months later with seven bars on the screen.
This is why any good org will make sure to observe all important KPI’s while doing an A/B test. If your “email signup” KPI went to the moon but tanked your “bought shit” metric… you should probably roll back.
It's really easy for this to be noise from false negatives. On an A/A test with five guardrail metrics and a threshold of p>0.05, you'll get a false negative 22.6% of the time.
Having experience design expertise at the executive level can mitigate this. If nobody is advocating for good user experiences, nobody is advocating for the usefulness of your online product as a whole, and it shows.
The primary not secondary effect of random sampling is noise.
Recently I got an account on a developer tool (Checkly) because the company I joined uses it. I then got 5 different emails from them in a 48 hour period.
Like I'm sure many users sign up then drop out of your funnel but I'm part of an organization that's a paying customer. I'm already going to use your stuff. What possible business benefit could there be to you spamming me? If anything you're risking the inverse - it made me want to migrate away from the tool.
Sounds like they are optimizing for a KPI on time to full integration. Someone else paid for it, now they want to make sure that you are actually using it.
Still absurd, but I know this is a problem friends of mine have had.
The "here's how to use the product you paid for" emails that trickle out over a few weeks or a month is vastly different than "throw you into every single marketing email bucket we have" but companies seem to lean toward the latter.
The 'cost' of email is borne by the recipient, mostly.
Hey CSMastermind, I'm founder at Checkly and I got a ping we were mentioned. We do send out some "getting started" messages on autopilot. We also did a product launch Thursday and then our regular changelog on Monday. That probably was overwhelming. If you could email me on tim -at- checklyhq -dot- com I will track down if we hit the the spam cannon too hard.
I know this is well-intentioned and not an automated message, but I find it ironic that you managed to get an additional message to him over here.
one asking them to establish contact for communication to continue, no less. xD
I can't help be reminded of those "if you'd like to unsubscribe call us so we can harass you with offers even more" you see in some company terms and conditions.
> if we hit the the spam cannon too hard
Maybe consider, oh I don't know, not deploying a spam cannon in the first place?
The good news is that this sort of thing is enough to trigger the spam filter in my email program so I'll never see it.
Companies have this astonishing arrogance assuming they should be able to interrupt actual humans with robot messages from noreply@crapcorp.com - typically marketing messages that cannot actually be replied to and do not qualify as respectful communication.
>Maybe consider, oh I don't know, not deploying a spam cannon in the first place?
I'm not at all related to the people you are talking to, but I am a dev who has accidentally built spam cannons before. It is surprisingly easy to spam people - you need to deliberately build in checks - and put them in the right places to prevent spam cannons.
I don't use your tool and I'm unrelated to CSMastermind, but I can tell you that you absolutely hit the "spam cannon" too hard. 5 messages in 48 hours? That's absolutely ridiculous. They could all have been 1 message if it was important that they get that info, and it could have been 0 messages if it wasn't.
You are correct. I will figure out two things.
1. Why our fancy expensive mailing/marketing tool (Intercom) does not spread these messages in a relaxed fashion (it should)
2. If of those five messages, maybe two were the obligatory "confirm your email" and standard "Welcome to Checkly, this is what we do"
TL;DR we should not spam.
We checked our Intercom. We spammed. One user got six emails in a 5 day span. There were some separate initiatives going on. We didn't check the settings and current outgoing mail. We will change it.
BTW: all of this was done without any bad intent. It's 100% us being stupid and not coordinating and being diligent enough.
This is such an impressive series of responses. I know some folks are flipping you shit here, but I’ve seen a lot of people try to “engage” with customers, in HN and elsewhere, and I think you’ve done it really nicely. You sound like a human; you admit mistakes; you follow up. Good on you.
Send a seventh to apologize! /s
(on a serious note, good on you for taking action though. +1)
You only get one chance to make a first impression.
>You only get one chance to make a first impression.
Well, this person had 6 first impressions.
tnolet, if I get five emails from a company with 48 hours, I will set up a spam rule for them. If I really need to get that much help to get started, your UX design is likely not very good.
You are correct. I’m the same. Our emails are even pretty good. Our UX is pretty damn good. Somehow we dropped the ball here.
When you fix it, write up a post mortem so others can lesrn from your mistake!
Since it's distressingly common.
Let me propose a different possibility.
Suppose the site isn't concerned about the sale very much at all?
Suppose the thing that the site uses to reel people in, is a good deal that isn't very profitable to the site but what the site then tries to sell is a very profitable near-scam/ripoff. Scaring off half the ordinary customers becomes worth it to get even 10% of the customers buying the scam.
What seems like "poor optimization" can easily optimization for something and could be seen as "the scammification of the web".
Exactly this.
Many here are focusing on a single interaction. While the outcome of that single interaction is negative to the company, the aggregate outcome must be positive somehow, perhaps in the way you said, but it doesn't even have to be a scam or ripoff. Some products just have a higher margin and/or customer LTV.
As an individual, it is annoying, but the company is focusing only on the macro effect when it does something like this.
> I'm signalling interest in product X as hard as I can, and you've AB tested that this is a great time to jangle your keys over there instead?
If I may... I have seen data from a big retailer that shows any user that doesn't immediately purchase an item, is actually not that interested in the product on the screen. If a customer is going to buy something, they will do it promptly. Anyone else is just browsing.
YMMV, grain of salt, context dependent, etc, etc.
In this case, what I'm referring to is:
While I understand what you're getting at, they do not yet have the info to know that I'm browsing or whatever. They were so excited about their stupid popup that they didn't even get that far.1. Clicked on page. 2. Took maybe 10 seconds to take in what is "above the fold". 3. Scrolled down to see what else there is. 4. BAM! Popup triggered by scrolling down.I will say, generally, when I'm to the point that I'm entering credit card info, I've put up with it, but I have been chased off of sites by this use case before. Especially if that popup also crosses with some other popup and now I'm chasing down the tiny little 6pt light-grey-on-white little "x"s to click away the popups in the right order.
Actually, let me add that to my touchstone list. OF COURSE hiding the dismissal icon for the popup increases "engagement" with the popup. You don't even need to run a test for that, because what other result could it have? "We shrank the close icon, moved it to the lower right corner where nobody expects it, and made sure to kill the constrast even harder, and customers dismissed it 2.5 seconds more quickly on average"? Of course that's not possible. But... that's the wrong question! And AB testing is really good at answering the question you're asking, it has no mechanism in and of itself to see whether you're asking the right question. If you're getting down to this you've overoptimized.
Or a popup that triggers for moving the mouse towards the top bar. I constantly highlight text for reading purposes (A habit I have) of course that moves the mouse. Not a reason to annoy me with that shit.
They know you clicked on a page that is for a given (set of?) product(s). They know you stayed on that page for longer than a moment, then scrolled down.
They may also have data that >90% of customers who make a purchase choose/click in <10 seconds, and never scroll. Or may even take a different path entirely or similar.
They could interpret all of that as you inspected the product(s) on the page, didn't choose/click one, and _now_ is the best time to prompt you with suggestions to maintain engagement in >75% of instances.
You could also be one of the lucky few customers in a sacrificial control group unknowingly testing their new shitty popup to measure its impact on sales.
Whilst I can't say for certain that any of this is your case, I absolutely guarantee there are many retail sites that do this - I've worked at several now - and this is bread-and-butter material to data teams working with customer engagement.
In fact, you want the dismiss button to be easily discovered and used. Dismissals are an important signal about the quality of the content; just as important as clicks. When you make the button impossible to use you rob yourself of that signal while simultaneously making click data far less reliable.
Unless you're a manager, not getting the results desired.
Then the problem is people, and "they're just clicking close out of reflex!".
Cue hiding this, and results you wanted appear! Success! Raise! Promotion! Or, maybe more funding, due to signs of greater engagement.
Sure; until a company that actually understands this stuff comes along and eats that company's lunch. In the long run, reality eventually wins.
The number to figure out is how much time do you wait to interrupt. Also wonder if it's person dependant. Some people aren't impulsive buyers.
>I have seen data from a big retailer that shows any user that doesn't immediately purchase an item, is actually not that interested in the product on the screen.
Fuck that. Unless it's an emergency (in which case I'll go to a shop), anything i purchase online is carefully considered, sometimes over several weeks. My revolving user-agent, vpn etc may give the illusion I'm not interested....but i am indeed, just browsing...
Could the popup be a punishment for reading the fine print?
I love to sign-up to news letters and get a discount. Of course I am giving you my spam account I set up for this exact purpose.
Honestly instead of a cookie law I wish GDPR has imposed a rule that required all those stupid interstitial pop-ups to conform to a standard that could be easily blocked by the browser. I mean they are asking for emails, which is a massive and totally unnecessary proliferation of personally identifiable information.
I hate them so much. It makes it feel like so much more of a chore to try to do research or look for things online. I'd honestly prefer 56k page-load speeds if the pages were free of this garbage.
Pet peeve: GDPR is not the cookie law, and is fact a very sensible collection of restrictions on how companies can collect and process personal data. The annoying banners you see are against the spirit of the GDPR, and quite possibly against the letter of it, too.
Amazon does this in a horrible way.
1) I open a product in a tab. I click "add to cart" and a "related products" sidebar slides in. I close the tab in annoyance.
However, some items exhibit a similar pattern, EXCEPT...
2) I open a product in a tab. I click "add to cart" and a stupid extended warranty sidebar slides in. I close the tab in annoyance.
The difference?
Item #1 gets added to my cart
Item #2 doesn't make it to the shopping cart.
Amazon just silently deleted my purchase.
I actually don't know when it dawned on me that this happened, but amazon lost money on me because I didn't buy certain things.
The thing is, you haven’t really shown that these sites aren’t successfully optimizing for conversions. Couldn’t it very well be the case that UI which annoys some high-intent users by interrupting them or adding steps to the checkout process also increases overall conversions?
True, A could be “annoy users” and B could be “don’t annoy users”, and A could perform better overall, but in this framework you might be missing C which is “annoy users except those already deep in the funnel”.
My point is more that there can be two groups of users with mutually exclusive desires, and it can be practical to choose to satisfy one group over the other based on your particular goals. There's not always some monotonically increasing function where you can over time satisfy a higher and higher portion of your users.
Most people are just browsing so it's optimizing for that when it should be optimizing for sales
Maybe your money is no longer the most important thing for them at that moment? Given that you’ll probably proceed with the purchase anyway, they could be making more overall from the crowd which also signs up for updates.
I would really warn you against thinking your intuitions are going to be a good sign for whether or not something is a good retail decision.
Indeed, quantitative data without qualitative understanding is useless. You can't understand data without understanding mechanisms, because there's an infinite number of possible confounding factors that you can only dismiss through your qualitative understanding of the dynamic you are measuring.
What you describe could actually be an artifact of "flaw of averages" (an article was posted here just few days ago, TLDR: air force discovered that nobody fit the body dimensions of an average pilot and therefore created adjustable seats for jets). If promotion banners are just segmented to general millions of people then that's exactly what will happen. But not sure if better segmenting is even possible with data available.
I've talked about this on HN but I'll say it again.
A couple years ago, after being an Android fan for the better part of a decade, I finally bought myself an iPhone and pried myself away from Google's ecosystem wherever I could. And Apple didn't even need to do any work for me to make this decision. It was the years of abuse from Google that you experience when you decide to use a Google product or service. And a big part of that was the constant A/B/C/D/E/F testing. I never felt like I was using a complete product, everything felt like a constant beta that could be changed or rearranged at any point, and I was just doing free testing work for them while they harvest all my data.
Every app update was a risk of the app rearranging itself, or features appearing/disappearing. Eventually it didn't even come from app updates in the Play Store, and new interfaces would just appear one day when a server somewhere marked your account as being in the group that gets the new UI. This app that you were familiar with could at any point be rearranged when you open it on any given day. Then maybe a week later you open it and it's back to how it was before. A button you thought was here suddenly isn't, and you question whether something actually changed or if you're losing your mind. It's a subtle gaslighting that eventually I couldn't stand any more.
To me, A/B testing means you don't respect your users. You see them as just one factor in your money machine that can be poked and prodded to optimize how much money you can squeeze out of them. That's not to say a company like Apple is creating products out of the goodness of their heart, but at least it feels like it was developed by humans who made an opinionated call as to what they thought was the right design decision, and what they would want to use. And in my 2 years of owning an iPhone, I've never opened my reminders app to find out that it's completely unrecognizable, or my messages app has been renamed or rethemed for the umpteenth time.
"To me, A/B testing means you don't respect your users. You see them as just one factor in your money machine that can be poked and prodded to optimize how much money you can squeeze out of them. "
Your perspective is extremely short-sighted. A/B testing can result in this type of behaviour but that's just poor A/B testing. Good A/B testing focuses on removing distractions from the experience and helping users derive more value from the product. Bad A/B testing tries to make things more discoverable, where discoverability is often just noise and distractions. Good A/B testing ensures that the money machine, as you put it, pays its dues to users by making the product experience delightful.
UX feedback is usually retrieved via user interviews.
I personally have never heard from a product person „Let’s A/B test whether this is delightful“. And I think that’s because delightfulness or satisfaction is impossible to quantify in A/B tests. You only get to measure things like engagement, signup rates, retention etc. - cold hard taps on the screen, and no more.
And I must say that I‘m glad that, right now, apps can’t just scan my face (or cortisol levels, or pheromones or…) for emotional clues while I read their pesky push notifications that want to coax me back into their daily active user base.
This is not really true. You can pop up surveys in the app or on the site. They can be super non-intrusive. For small effects and large user bases, you simply cannot get useful information from UX studies.
> Your perspective is extremely short-sighted.
it's the perspective of the normal users.
every time i'm using a website and it does not behave exactly the same than for other people or I notice some AB testing, in my head it goes "who the fuck these people think they are?". The computing experience must be consistent and repeatable. If I wanted something that can change depending on the current position of the stars I'd ask another human, not a computer.
A/B testing can result in this type of behaviour but that's just poor A/B testing.
"You're doing it, but in some subtle but very important way that's not at all obvious to you, you're doing it wrong."
How many tech startup patterns fit that? That's a sign that either the pattern does not generalize well or it's snake oil.
I don't think so. A/B testing is just not a cure-all and alternative for good design, that is all. I mean, if the pattern is 'don't bother thinking too much about ux, just A/B test everything in PROD to death', then yeah, you are right.
However, there are many products which allow users to enter into a beta or test group, where they are the willing subjects to their experiments (in exchange for the latest shiny new stuff). This has the aspect of consent, leaving the 'stable ring' free of such variability. The fact that google and many startups are not using such consent and offering stability, doesn't mean it can't be done or isn't done.
> "You're doing it, but in some subtle but very important way that's not at all obvious to you, you're doing it wrong."
Doesn't that exact statement describe how Agile (or many other concepts, really) is used in a lot of companies? There's nothing wrong with it in principle, but practice is all over the place.
I think of it as validating and improving imperfect user assumptions.
This reads exactly like parody corporatespeak jargon.
Clearly it's the users who are wrong with their stupid assumptions and we will correct them with our mandatory weekly update.
Assumptions about the users, not assumptions made by the users.
I never seen an A/B test where the goal was not to maximise profit for the company.
Have you ever seen "you've already bought this" on Amazon? Highly tested, reduced sales. Overruled by judgement. Stayed on the site. AB test results don't mean you leave your judgement behind.
Maybe. I find that very useful when I'm trying to buy the same thing again. It acts as a trust marker.
I do wish they kept better historic purchase data. I assume it's deleted absurdly quickly for compliance?
Huh, Amazon keeps years of order history for me and you can't even remove it, just "hide" orders which will then still be available just with an extra step. And I'm in the EU so no idea where else they would have more compliance problems with that.
Isn't that any goal of an organization? Everything ladders up to the success of the company. Especially if you're a corporation where you have fiduciary responsibility to the shareholders.
AB testing is beyond UI. It could be AB testing algorithms for recommendations or search results. When i worked for a CPG company, we would AB test the sample sizes in the boxes. To answer the question what is right amount of product for a sample. We would test shipping to speed, was it worth the upgrade to expedited shipping in terms of LTV to justify the extra cost.
What would an A/B test where the goal was not to maximize profit even look like? The very act of creating an A/B test is because the worker wants to improve something in search of higher profits.
Over what time horizon? You should optimize short term metrics (for feasibility) that are (as best you can approximate) the cause of long term profits. The profits part is what makes it the business, using metrics for decisions are what makes it an A/B test. I think where people run into trouble is when they are optimizing only for this quarter’s revenue directly because long term value is too hard to measure.
Oh yeah? I've never seen a marketing department with a decent understanding of microeconomics. Most in fact are trying to maximize budget ie. expense.
That's probably intentional and a sign that it was a well designed A/B test :)
This comment is way too broad and cynical.
It reminds me of the sentiment I sometimes hear from my teenage children before I explain to them that the world isn't a Bernie Sander/Reddit gotcha soundbite and that reality is complicated.
"Reality is complicated" is often a way to rationalize "I'm going to do it how I've always done it, regardless of how it looks."
> This comment is way too broad and cynical
Or, as anyone who has been on the internet since the 90s will tell you, bang on the damn money.
What would the other advantages of A|B testing be for a corporation instead of widening the funnel, increasting lifetime revenue or other such bottom-line focused goals?
Assuming you aren’t running a charity, increasing the bottom line is always your goal. The difference between a company that burns user goodwill in the process, and one that doesn’t, is the difference between short-term and long-term thinking.
The downside with hired testers is that they are unlikely to be a statistically representative sample of your target market. I don’t think this is actually a problem for most startups or new products being launched by major players, since a lot of UX issues early in product development will be obvious to anyone who isn’t the original developers, and the number of actual users you have won’t be big enough to serve as a statistically representative sample of your target market anyway.
you can genuinely find out what message makes more sense.
explaining what your product does is hard when you’re the product expert. A/b testing ad copy helps folks parse what your product does.
If your product is actually good then having more people understand what it is, is also good.
And all of that amounts to what if no one decides to cough up money to use your product?
Let me be even broader and more cynical: everything a company does is ultimately to maximize profit.
Seems like saying no one is actually altruistic because they get good feelings from helping people.
Well, then, the reason why people hate A/B testing is because everyone does it wrong. People can write fast software in Python too but the reason it’s known as being a slow language is because it invariably gets used differently.
A/B testing is a tool, and as Deming said, the aim defines the system. In your definition of Good, you are defining the aim.
I've been in a similar situation, where I created a relatively sophisticated A/* testing and control system. My idea of good use of the system ended up being very different from how the team employing the system thought about it.
I believe that is part of the point of the post, that unintended, and even unimagined side effects plague even the best of ideas.
Do people honestly think that Apple doesn't do A/B testing?
Apple unequivocally does not do active A/B testing on their users by changing applications or the operating system out from under them.
It's almost like they use a dedicated focus/beta testing group or something, instead of making all their users join in.
I can't think of an Apple engineer who would make such a definitive statement on behalf of the company, in public. I also can't think of anyone working there who would have definitive knowledge across all apps and software, because disclosure would prevent such knowledge.
Apple introduced A/B testing to Testflight in 2017 and more A/B testing to the App store this year.
Source: I worked at Apple, but not in software.
Great, but what exactly does that mean for this conversation and how users percieve Apple products?
Yep- it does this in betas, but not in production apps.
You're missing GP's point. This is their experience with A/B testing as practiced. It's not about whether A/B testing could theoretically be great.
I’m not sure this addresses the core criticism of using your users as a testbed.
A/B testing is always going to be distracting to your users though.
Sometimes they'll be in A, and sometimes in B.
The button moves or disappears or appears.
Your user does not get an experience they can rely on
> Your perspective is extremely short-sighted
no, yours is. if some company wants to do some testing, they SHOULD PAY users for that. A/B testing is just exploiting users to get free testing.
> Every app update was a risk of the app rearranging itself, or features appearing/disappearing.
This shit drives my parents insane. Me too, when I have to help them. I've had to spend tens of seconds looking at a major screen in the phone app, of all things, to figure out WTF I'm looking at so I could help them figure out what was up. Re-arranged every update (or new phone) for absolutely no reason, terrible affordances, poor use of their own design language. Ugh.
I'd get them on iOS but they need larger screens and the $400 small iPhones (what I have) are already more expensive than they think a phone "should" cost, so they keep buying $200 Android phones about once a year (hoping the next one will be better) and not being able to use them because the UI is garbage.
At one point I was trying to set up my grandma with a popular video calling app on a dedicated device so we could stay in touch.
Before I could give her the freshly grandparent-proofed device, said video calling app upgraded on my parents' PC first and changed literally every single element of the UI beyond recognition. To someone the age of my grandma, that would be literally like bricking the device remotely, because none of the buttons would look the same, and she would not be able to work out how to use the new interface.
STOP CHANGING THINGS! Even if the new UI is better (debatable), some people just like or rely on a particular layout to operate the device or app. Don't rearrange without giving a ~permanent setting to use the old layout.
> STOP CHANGING THINGS! Even if the new UI is better (debatable), some people just like or rely on a particular layout to operate the device or app. Don't rearrange without giving a ~permanent setting to use the old layout.
The fact that we're 15 years into smartphones being popular and that phones & computerized address books were basically fully-solved interfaces long before those took off, and they're still fucking around with phone app UI in big ways, is a sign of some kind of institutional or industry-market failure to me. Or both. Allowed, I suppose, because Google's market position (i.e. monopoly across several markets) both gives them the surplus profit (i.e. rents) to fuck around and waste money on harmful to users crap like phone app redesigns, and insulates them from any actual threat to their profit due to those bad choices.
Well, all of these companies employ teams of ui and ux designers. They will never show up to work one day and tell their boss the ui is done. No matter how perfect the design currently is
of course not. that's the responsibility of the company management to say "okay UI is good lets divert resources elsewhere"
In the same way that we try to use semantic versioning to prevent unintended breakages in dependents of our public-facing APIs, I think we developers should start considering the UI/UX of our apps to be their public-facing API, and use the philosophy of semantic versioning accordingly.
When we rename or remove a function from an API, that's a breaking change, any dependent software will no longer work unless it's modified to take that change into account.
Similarly, when we move, rename, or remove a UI element, keyboard shortcut, or visual affordance, that should be considered a "breaking change" for our dependents, the humans on the other side of the screen. And in the same way, we should avoid making such changes unless the long-term benefit of doing so outweighs the short-term cost. Plus, users will know that moving from 7.x.x of your software to 8.x.x will require them to relearn some aspect of it.
We should also keep arround long-term support releases of older versions of user-facing applications so that people can update at their own convenience and not at a point determined by some release manager. With server software and libraries no one expects everyone to upgrade immediately to new major versions but somehow this is an acceptable expectation from end users.
If you put a dollar value on your own time spent providing training and tech support to your parents, the iPhone options start to look much cheaper.
I've considered just getting them one but that's another new interface for them to learn. Though one that's much more stable across years.
It's doubly sad because the larger phone or iPad would be perfect, but every year it's another $200 to tell Google they're doing the right thing.
At least on iPad/iPhone you can set the apps to access Google mail, etc, which doesn't change as often, but still too often.
You nailed it. Google is constantly forcing users to relearn most of their products year after year. Give me Google products from a decade ago and I'd still be happy. Now I'm moving on from Google also. It's an untrustworthy brand.
Google recently shut off Hangouts on me with a flight (since several of my contacts reported that Hangouts worked fine for them).
It's kind of mind boggling they'd decide to do that - the replacement they direct me to (Google Chat) doesn't even have feature parity so I just dropped them and moved my social circle using Hangouts to a different app (since at this point they all faced the same problem and we decided on a different platform).
I'm really curious how the A/B testing for this went down - Google is willingly throwing customers away because somebody wants to pump numbers for a new app that is objectively worse than the old one.
At this point Google Maps is the only product that is keeping me with them, but even that one is beginning to wear thin.
> Google Chat) doesn't even have feature parity
Which features are you missing?
Video chat. It redirects me to Google Meet (yet another app!) and the model that Google Meet uses is completely backwards (create a meeting, then invite people) compared to how I use Hangouts (call someone).
It's a complete pita and the family-acceptable-factor is low.
check the AB test results
Ironically the place I'm getting forced A/B testing now is the Playstore. They moved the update apps section so it is now 2-3 clicks more, making me have to make a seperate shortcut just to reach the app update quickly.
The point is of course to make it annoying to manually update apps and enable auto update. I have been burned too many times with a auto update so I refuse.
This wasn't enough, they really want to force me enable auto updates to the point of the update section of the app having 50% of the visible space on my big screen being covered with a message to enable auto update over WiFi. [0]
Whoever is doing this at Google... Stop. Just stop. It is cringe.
Recently they remove YouTube PiP on iOS. Then it came back to my device but not others in the family. We pay for YT Premium. This is beyond infurating
A few months ago, I came to the belief that Google is the ADHD toddler of user-facing software development - absolutely unable to sit still and concentrate on anything, hence the constant UI/UX churn, half-baked products, and graveyard[1] of shiny things that they worked on for a few years before abandoning.
Google seems to be really good at making developer tools like Borg and Blaze - however, I think that as an organization they have some deficit that makes them not responsible enough to develop user-facing software (like, uh, an operating system).
Maybe Google would be better as a B2B company.
We AB tested a performance enhancement to our frontend web app to show that speed had significant benefits to the business. we use the results justify the investment cost. We spent the next six months working on making the site faster because of it. It is a tool. How would you measure things without ab testing?
The fact that you need an A/B test to demonstrate that frontend performance has an impact (on user experience first and foremost) in 2022 speaks volumes.
But how big is the impact and does that return compare to that of other investments you aren't making because you are improving performance instead?
Yet it says nothing. Large companies are slow and stubborn.
this is the stance I take on AB tests. The are objectively good at things you shouldnt need objective evidence for
I moved back to iPhone in the 2015s onwards (I think) because of the crap quality of devices available, and the bloatware atop if android.
In many cases the hardware was so poor it was hard to make a call due to the touchscreen.
Since the primary thing I want the phone to do is make a call I switched to the “it just works” camp and haven’t regretted it.
Except getting photos off the phone. Until I realised the best tool for that is … Ubuntu!
Yes, I hate Apple, but I'm starting to hate Google even more. One of these days, I might switch too.
Why can't we have nice things?
The problem is not the AB testing, as it can be a good thing to improve the experience, the problem is poorly set, short sighted, OKRs. The author points that in the text, as he mentions many times that the leadership never asks the right questions, mainly how it will impact the client in the long run (being NPS evolution, lifetime value, etc..)
Depends on where A/B testing is used.
If it's something one-and-done (like different permutations of a signup flow to see what is easier for users), then I don't see the harm in it.
> To me, capitalism (A/B testing) means you don't respect your citizens (users). You see them as just one factor in your money machine that can be poked and prodded to optimize how much money you can squeeze out of them.
You just described doing business in todays world.
Being a bit more generous towards A/B testing, I would make a counter point: _not_ doing any kind of user testing, of which large scale automated A/B tests are just a subset, means you don't respect your users. Because it means you just assume you know what their experience is like, or worse: you don't even care about it and bother to learn something.
Your complaint seems to be more about the scale and aspect of automation honestly, and continuity of the services, which is a valid complaint against Google but not about A/B testing in general.
> Because it means you just assume you know what their experience is like, or worse: you don't even care about it and bother to learn something.
A/B tests are not the only, or even the best, way of collecting user feedback.
> means you don't respect your users.
I was just pontif- er, talking about this to someone, a couple of days ago.
I love the users of my products. Most of my products are free, and are carefully-crafted, highly-polished, complete deliverables, and I fret over how they are used -even if by a tiny number of end users-, like a nervous hen. I do what I do, out of love for the craft, and out of a genuine desire to make people's lives easier, through the technology I have at my disposal.
It is my belief that most tech companies despise their user base. Users are little more than cattle, to be fattened and slaughtered. "Caring about the user" means optimizing for "engagement," or keeping them trapped within their own ecosystem. John Oliver did a rant about this, recently[0]. It has nothing to do with actually caring about the user, or solving their problem. It is about harvesting users.
In fact, my discussion about this, came about, because someone wanted to keep users inside the app I'm writing, as opposed to linking them to a more familiar app, on their phone (for the record, it was for videoconferencing). Linking is a "no-brainer," as I can link out to dozens of installed apps, using the simple URL scheme method, built into iOS[1], and "keeping them in the app," would have required several months of extra work, polluting the app with megabytes of junk code, because I'd need to use SDKs, and also kill the ability to easily scale to add new clients (contrary to popular belief, Zoom is not the only videoconferencing option). It would also have possibly put us on the hook, legally, for what happened in those videoconferences.
[0] https://youtu.be/jXf04bhcjbg?t=638
[1] https://developer.apple.com/documentation/xcode/defining-a-c...
The problem with AB testing is that it's a short-term strategy. For example, if a news site runs AB testing with headlines, they'll find that bullshit clickbait headlines get more pageviews than concise, accurate headlines, but the constant use of clickbait headlines will over time destroy overall traffic to your site. More frustratingly, sites run by smart people tend to fall into a balance where the worst articles get the most alluring headlines.
This highlights the major downside to "data-driven" policy and decisions.
Data can "lie". What is observed is not always reality, simply what we can see of it.
Consider auctions. You never actually "see" the bidder's demand or utility. Yes, there are some ways to structure auctions that in theory show willingness to pay and such (ignoring confounding factors and irrationality), but you don't actually observe anything beyond the bid.
Similarly, on websites, you don't always know the causal reasons people click here or there. You know perhaps enough to predict a step-wise behavior, but don't (usually) understand the full behavioral lifecycle -- especially if a metric improves but at the hidden cost of decrements to conversion and similar.
There’s nothing about AB testing that requires you to use short-term metrics. I used to manage AB tests for online dating sites (OkCupid, Grindr) where subscription revenue is what matters, and the gains of any strategy will take months to materialize. We were well aware that, say, raising prices would yield more short-term revenue at the expense of long-term revenue. That didn’t stop us from testing, it just made the statistics more complicated.
Sure, but in many cases, such as the example given by GP, long-term AB testing is hard or almost impossible. For the testing to have validity, you need the A and B cohorts to be stable, and have little or no overlap, and that is hard for long time spans for anything that is not account based (and somewhat dangerous even for account-based things, as people will almost certainly start to notice that they are getting a different experience than their peers, which may upset them).
In online dating, at least, this is a non-issue. Using an online dating app is, ironically, a solitary enough activity that people don’t go around comparing whether their UI is different from their friends’ UI. You of course can’t let the same user see two versions, but that just means doing permanent group assignment on signup. We used to A/B test subscription prices over enormous ranges (e.g., randomly giving some people 90% discounts) and approximately nobody noticed outside of obscure Reddit threads.
I wonder if you two are talking past each other a little. I'm thinking that A/B testing for content is a different beast than A/B testing for experience.
I’m not disagreeing — My point is really, “not all AB testing is bad, even if the kind you’re most familiar with leads to shitty content.” My second comment was just more of side note.
Sure, this is a niche with some very nice properties for this type of thing.
OkCupid has completely destroyed its interface and utility, so whatever they’re doing doesn’t seem to be working anymore.
I left in 2015, as soon as it became apparent the party was over. OkCupid went downhill for a lot of reasons, but overly aggressive A/B testing wasn't one of them.
What do you think the reasons were?
Did you A/B test the matching algorithms?
Don’t recall specifically. There maybe may have been randomized A/B tests on the “special blend” at some point, I think — it was never spelled out on the site, but I think that was the experimental mix du jour and we tended to use that instead of forced randomization.
I write A/B tests for headlines for a news site, this is too broad a generalization. Clickbait titles aren't great for building subscribers or establishing trust, which is what we really care about (LTV). To the author's credit, our deepest testing insights come from analyzing a lot of historical data (not just last week's).
I’m a huge fan of metrics. Huge! But they are worthless when not combined with qualitative experience. AB testing needs to be combined with human-centered “actually talking to people about their experiences.” Otherwise, you drift and the metrics no longer match the objective.
> but the constant use of clickbait headlines will over time destroy overall traffic to your site
I'd add a bit of nuance here. They are very good at driving traffic, but very bad at building an audience. You do this long enough and your news site is now optimized for attracting hot-take appreciators who engage with the news like a tabloid. This drives away everyone who doesn't want to be reading a tabloid and makes you more dependent on keeping up with traffic-gaming strategies to continuously drive traffic. You've basically shifted your business from being a place that produces journalism to being a place that figures out ways to game social media trends and SEO.
Indeed,
If do more ad placements increase revenue is the test and then there is 20% jump what are you as an engineer going to do? Tell to management that its bad?
I say this a lot and I will keep saying it. Conversion != customer obsession. There is a place for A/B testing. It is necessary and can be extremely beneficial in helping your customers enjoy and use your product more successfully.
The main issue is that people mix conversion with customer obsession! Whenever you work on a product or feature you should be asking yourself "Is this really good for my customer" - if the answer is no, then no matter what the A/B tests/conversion rates show you don't do it.
Unfortunately we mostly hire the wrong people as PMs, who then hire clones of themselves. They are not truly customer obsessed and use A/B tests incorrectly which results in products that trick or force customers to do things they don't understand/want to do. Long term this is bad for the product and company
My 'favorite' silly thing PMs do is UX research studies (typically on 5-10 people) and essentially ask completely untrained people if we should go with X/Y or Z. It's a super-effective way of avoiding responsibility for product decisions ("the data suggest we should go with Y"). If only building good products were as easy as asking what customers think they want.
Either they're doing the UX research wrong or (more likely) you're misunderstanding the process. You don't ask them if you should do X/Y/Z. You ask them to do X in the program, and see that none of them can find widget Y which controls it because they keep clicking on widget Z.
It's about observing the users fumble through your UX when you know their motivation.
> It's about observing the users fumble through your UX when you know their motivation.
Some time ago we did such a test. We called 10 customers to our offices and had them do some flows in the application. They didn't fumble. They pretty much did what they had to do and left positive reviews.
That whole thing got scrapped because consultants convinced our CEO that qualitative data is not good for global scoped startups, and that we should be building based on quantitative data.
Honestly, in less than a year, our customer experience was already taking a dive because all the extra little features we would add and strange UI elements, it became a confusing mess and our tracked NPS (Net Promoter Score) showed that. I've since left the company, but I check on them from time to time and they never really recovered and continue doing A | B in the hopes of hitting that sweet spot. It's just an unrecognizable monster at this point in my opinion.
Data analysis is the lowest common denominator of business thinking: the simplest, easiest thing that feels meaningful and objective. Anybody can sum up two lists of numbers in Excel and see which one is bigger.
I wish the problem were my misunderstanding the process, because then I could fix it easily by learning more about the process. I do get where you're coming from though.
only listen to customers problems and never their solutions
The term "customer obsession" has become a red flag for me when interviewing because I've never worked at or chatted with a company that had "customer obsession" as value that wasn't aggressively working to squeeze every dime from their users with zero interest in whether or not this squeezing was harmful to the customer.
An actual, sincere customer obsession (and btw I think we both completely agree here) means that you are willing to lose out on some conversion and revenue in order to make sure your customers are top priority.
Real customer obsession isn't just an ethical principle either, it makes business sense. The problem is that the value of customer obsession is realized over the span of years or decades. Companies that have a sincere customer obsession are the kinds of places that survive economic ups and downs, where people's children grow up and are loyal to the product because they remember the time their parents were treated well by the company.
If your only company focus is Q4 KPIs then you really can't have "customer obsession".
> The main issue is that people mix conversion with customer obsession!
The logic is: If they hate your app, they won't spend money. If they love your app, they will. Which is what would make you think A/B testing and UX work are the same thing.
There's really nothing new about this issue at all. Playing towards the average creates a lot of shitty stuff, in apps/websites as well as politics and wherever else there are metrics to track.
The genius of a good product is that it will make a stand and not give in to the whims of over-optimization in order to maintain its original intent. This is what made Apple unique.
It requires leadership with guts who aren't chasing the latest shiny object.
Yeah, this is key. Improving a product in the direction of customer intent vs against customer intent.
A/B testing is local optimization. It should only be done on a mature(-ish) product when you have given up on finding a global minimum.
Running experiments and A/B tests are popular because it is _guaranteed_ to give you signal. If you have a large engineering team and you're not sure how to filter the quality of results, gating everything through A/B tests is a well understood methodical way to ensure only positive work makes it way through.
Early stage startups should never A/B test. When you're searching for product market fit, you're doing global optimization within the search space. Your product will change drastically as you make new learnings. Premature optimization (A/B tests) will only be detrimental.
> Running experiments and A/B tests are popular because it is _guaranteed_ to give you signal. If you have a large engineering team and you're not sure how to filter the quality of results, gating everything through A/B tests is a well understood methodical way to ensure only positive work makes it way through.
It's almost guaranteed to ensure only false positive work makes its way through. If you're picking 0.05 as your P value, and you're running dozens to hundreds of tests, your false positives are almost certain to exceed your actual positives.
When I'm working for clients that do a lot of A/B testing, I suggest that they should always run A/A tests to ensure that they're not incorrectly rejecting the null hypothesis. If your A/A tests are showing significant differences, you have a problem in your testing pipeline that by definition can't be cured by more testing. You need holdout groups and selectivity about what to test, instead of just throwing everything at the proverbial wall.
Even checking A/A tests won't surface all the issues. A proper A/B test is one that samples over a long enough time to adjust to the true audience of the service.
For example, imagine a costume shop that ran a couple dozen A/B tests over the summer. Those results may look statistically significant. They may even stand up against the A/A test. But people that buy costumes in the summer are very, very different than people that buy them in October, and if 90% of the store's business is in the run up to halloween, then all these micro optimizations could actually make your total business performance worse.
I'm a A/B testing skeptic too, though I admit they have a time and a place. My favourite are ones that can be reasoned about as actual hypotheses. This usually involves some degree of data analysis or segmentation. For example, increasing font sizes may boost conversion, and a later analysis shows that this was almost solely a lift in conversion rates amongst the 45+ cohort. The data in this case isn't just blindly driving design decisions, it's helping inform the staff on how to better design in the future for the audience we have.
Well, if you are running hundreds of tests with 0.05 p-value, you will get plenty of false ok A/A tests, and there isn't much of a reason to expect them to be correlated to actual signal on your A/B tests.
A/A tests do test your methodology as you said. But they do not fix a p-value one order of magnitude higher than it should be. (And yeah, I'm aware you know that, but your comment places them on the same context, so it got misleading.)
Great insight. Without this approach, A/B testing could be used to generate an infinite stream of meaningless work
The first company I worked for, and also the first company I saw A/B tests at, once ran an A/A test because someone was a bit skeptical about some of the test results that had been claimed.
Predictably, whatever metric we were watching on it (probably conversion) swung wildly to either side over the first few days. The look on some of the product managers' faces was pretty great. After about 2 weeks, it settled into a steady state where each "version" performed equally (measured cumulatively, so just large numbers in action).
The conclusion from this exercise was...
"It takes 2 weeks."
¯\_(ツ)_/¯
That's why we calculate stuff like effect size and power of a test (or even better, use Bayesian statistics); just p < 0.05 is practically almost meaningless.
"Running experiments and A/B tests are popular" ... because you can give up on your own judgment and opinions and say "the data says"
> give up on your own judgment and opinions and say "the data says"
The beauty of AB testing is that you don't have to give up your opinion. You can just change irrelevant things until the result you desire gets proven by chance and now you've got data to base your opinion on!
Even for a mature product where you might be doing A/B tests to explore hypotheses that you think will improve the product for the user, it is also worth considering doing mountain tests where you try to escape the local maxima.
Reading this makes me think of the handful of sites, often targeted at professionals, that highly optimize for the experience of actually buying things. McMaster-Carr comes to mind. Their users shop there over and over, and McMaster wants to keep them. So you can find things for $2 or $2000, shipping prices are inoffensive, customer service is friendly but rarely needed, and there are minimal distractions on the way to checking out or even after checking out. The only real issues are mostly related to the fact that they sell so many products that one can get lost in the 4000+ items that all match the search. Well done.
This is an interesting contrast to Amazon that also makes checkout easy but bombasts the user with thousands of listings, mostly mildly fraudulent and consisting of absolute crap, and still somehow gets repeat business.
McMaster-Carr might be the single thing I miss the most from my time in the US. It is like ... stupid good. Their listings catalogization is like godlike compared to alternatives.
The Amazon or Google way of throwing all things into the bin and spew it out to the users is BS. We are saying we live in an information age but I firmly believe stuff were way better catalogizised back when it was done manually by paid gatekeepers.
https://www.usplastic.com/ is another "industrial" example.
> This is an interesting contrast to Amazon that also makes checkout easy
Hey, would you like Prime with that? Do you know we provide free two-day shipping with Prime? If you sign up for Prime today you can get a $100 discount!
My biggest issue with A/B testing isn't even mentioned here... gaslighting your customers is absolutely not OK. Particularly with older people, the constant "where the f..k did Outlook now put feature XYZ" (in the case that comes to my mind, the CC bar which used to be tab-reachable, now you have to tab+space or manually click on the tiny gray "cc" letters) onslaught is just absurd. When you change how applications behave without telling the users, it's a direct attack on their muscle memory at best and makes them question their sanity at worst.
My second biggest issue is: it's rare that companies offer actual, live-human support these days anyway. When marketing adds A/B testing, shit becomes really annoying if something breaks as a result - usually the phone lines are suddenly flooded, the agents have no idea what has happened either and try to reproduce and figure out what's going on (and sometimes can't because they aren't part of the test group!), and so even people who haven't been in the testing group are going to be very pissed off.
IMHO, A/B testing without explicitly notifying the customers in advance should be banned by law, and that ban be harshly enforced. Customers are not guinea pigs, and with the rise of elderly people on the Internet this becomes an actual public safety issue (as ever-changing stuff makes it easier for scammers!).
You're downvoted, but this issue is more common than it seems; and I agree, more serious than it seems.
You're describing adversarial UI changes to small populations of then unsupported customers. This can have outsized impacts on vulnerable populations, eg., esp. the elderly.
Not to mention the lovely "This option that used to be here no longer is here." getting the response of "its still here (for me)". Youtube specifically loves to do this to me.
> it's rare that companies offer actual, live-human support these days anyway.
This is one of my most intense frustrations in the modern age. Complete and utter disrespect for your customers' time and knowledge.
> Outlook now put feature XYZ" (in the case that comes to my mind, the CC bar which used to be tab-reachable, now you have to tab+space or manually click on the tiny gray "cc" letters)
I can sort of understand wanting to hide stuff on mobile, but the discovery of controls to unhide things should be better. I often help people that are stuck trying to figure out how to do something in an app and not realizing they can click on something that gives no indication it's clickable is a common thing.
Desktop is another world. I often have 20+ inches of horizontal space and a hamburger menu. It's infuriating, especially when the hamburger menu is hiding one button.
Love the post and I tend to agree.
As Product manager/owner I've only found A/B testing useful when trying to narrow in on a specific demographic and you are trying to find some optimization.
The marketing/sales funnel use of it is kind of gross and has ruined , imo, something that has utility in a very narrow scope.
Cheers, also very much agree customers should be informed and allowed to opt out.
'hey we have a new UX to try..would you like to switch?' the data from people that opt-in is way better
I had a PM who pushed us to A/B test _everything_. We hired a new graphic designer who suggested that we change our product links from ALL CAPS to Title Case (a very popular idea on the team, and his first real suggestion after a few weeks with us), and she insisted that we A/B test it first. It felt like an insult to him, and a dumb test since title case looked way better.
The three key outcomes I observed from the relentless A/B testing were UI antipatterns, team burnout, and a well-attended conference talk about "how we ran 105 A/B tests in a year, and what we learned".
I've had a similar experience, although my learning was "people, even experts, are really terrible at understanding which treatment will perform better".
We always run >=3 variants, surveyed the dozen team members on which one they thought would run. Over the years, there was no clear pattern over who could make that prediction.
IE, it's not possible to predict which is the most effective treatment, even when you include a really bad idea in the treatments!
Was one of the learnings "Everyone hated the product manager"?
I give an incredibly similar warning every time a company I'm working for starts trying to dip their toe into A/B testing. I have a lot of experience with it at scale (one at a fortune 100 company) and I've even built an a la carte testing framework in aws for a company that didn't like Target or Optimizely.
Every single time I warn them about how the bill of goods they've been sold with A/B testing is almost completely unattainable, especially in the way that they want to go about it. They won't magically start getting more conversions by changing a button color. Even if they start getting more clicks, they rarely start getting more complete conversions, because the increased numbers is usually from people who weren't good leads in the first place.
On top of that every company I've worked with has no idea what the real methodology for good tests is, no matter how many times I explain it or put it in a slide deck. I would constantly get requests to use A/B testing for feature rollout.
Them: "Hey, could you do an A/B test of our existing site design and our upcoming redesign?"
Me: "if the old design performs better are you going to toss out the redesign?"
Them: "No we're going with the redesign but we want metrics on how it'll affect traffic"
Me: "Those metrics are useless if you aren't going to listen to them, and if the results come back and the old design performs better, you're not even going to put it in a presentation because it's counter to your planned actions. There's literally no point in running this test"
Them: "Run it anyway"
I recall when Booking.com rolled out the false urgency features. I was amazed at how utterly trashy and desperate they were.
The problem is it's not subtle at all; there's a handful of those features that, when combined, end up being overbearing and noisy: "3 people looked at this listing within the past 3 days! 12 rooms left at this rate!" I don't care. I'm looking to book business or vacation travel. If a spot fills up I'll just go somewhere else. It'll be fine either way.
I don't use them anymore for that reason. Old soul (me) is old. (I'm probably in a minority, judging by their advertisement budget.)
Booking.com (or any other hotel/flight booking sites) are the masters of dark patterns. This is what happens when software is 'finished', companies start to optimize for profit, regardless of customer experience.
But unfortunately, it works.
I've seen friends that I consider intelligent panic buy tickets/hotels, "because prices are going up since the last time I checked!"
Next time you want to book anything, browse around, ignore any of the fake urgency notifications, ignore the price (while staying broadly in your price-range, of course). Then when you found a destination you like, open the page in a private browsing window (or clear your cookies), and you'll see that prices and availability are back to normal.
Interestingly enough, booking this ridiculous A/B testing almost from the start. Super hard data driven company. There is a great book about them, written by 3 journalists, but it’s only in Dutch I think. https://www.amazon.com/Machine-ban-van-Booking-com-Dutch-ebo...
Having worked in the OTA space, every time I see their (sometimes funny) ads, I want to call them Booking.Nope
OTA make comparisons a bit easier, but everything is negotiated and contractually controlled to keep people from just going to the hotel directly. Secret hotel prices (like HotWire if that still exists (Expedia) or Travelocity's Top Secret hotels if that still exists (also Expedia)) are an even more crazy negotiation. Hotel Tonight at least used to contact the hotel chains every day for that day's options, though since they were bought by AirBnB who knows what they do.
These days I just find a nice hotel and book with them/their system directly. Airlines too, since airlines fail to give all their options to the OTAs.
In some ways its sad that aggregators don't work all that well in the main travel industry (Flight/Hotel/Car) but travel is extremely complicated, highly competitive and still very fractured except for airlines. Pricing comparisons are not very useful since they are so mangled and obfuscated that you may as well just go to several sites and do it yourself by hand. For example Spirit Airlines used to give us prices for their tickets at $X and were always cheaper than everyone else; yet once you booked at that price they hit you for everything extra (bags, res, for all I know oxygen) then our customers complained we were fooling them and the real cost was higher.
It's amazing how much this works though. I remember getting a call from my mum saying "the website is telling me London is already 78% booked for these dates!" It felt ridiculous having to say "Mum, it's March. You're staying there in November. I promise you it'll be fine..."
Imagine this beautiful business software which during the years and numerous A/B tests, "best UX practices", design languages and whatnot became this all "applesque", minimalist UI with 80% of it being a white space. By the way winning numerous design awards.
However entering e.g. client's information take a lot of steps, you are constantly clicking "Next" throughout these beautiful wizards and pages. After some time everybody starts to feel that there must be a better way.
What is the solution?
Spreadsheet import! Where you can just do everything in this "complicated" UI of Microsoft Excel, with formulas, and hundred buttons at once on the screen. Fill in hundreds of rows of information and just import it to the "beautiful business system".
This post is right up my alley, as I'm a.) the CEO of an AB experimentation platform www.geteppo.com and b.) I'm an Airbnb alum where we rolled out an analogous feature that labeled a listing as "this is a rare find!"
And the funny thing is, I agree with this article. Both the content and the heading of this hackernews article:
1. Notification/scare spam can have long term retention ramifications. The previous generation of experiment platforms made long term metrics literally impossible to read. But now companies can use holdout groups and long term metrics like retention to give more clarity.
2. Even if you can read long term metrics that include retention, the scare/notification spam could lead to less word of mouth growth. For travel, I am guessing that you will be swayed to drive WoM growth more by differentiated inventory, reliable service, and cost, so maybe it's just merely annoying but not a risk to the business.
3. Notice that Airbnb's UI is very, very different from Booking, Expedia. We made a conscious choice to always make sure Airbnb came across as a "sincerely helpful friend" as a booking platform. An AB experiment showing that metrics improve doesn't mean that you have to launch it. You can look at those results and say, "this metric lift isn't worth how ugly it's making my site", and that's a completely valid choice. (a choice we made often at Airbnb)
I absolutely agree that A/B testing in the way described in the article is a catalyst for creating dark patterns in a UI. Because dark patterns work, they deliver short term increases in particular metrics.
The author's idea is that this short term gain damages longer term metrics. That sounds logical and agreeable, but that doesn't make it true. Not in my experience anyway.
Probably the people complaining the most about annoying UI patterns weren't going to convert anyway. Whilst those coming with a specific conversion goal to your site will convert even if annoyed in the process.
Anyway, the true root cause goes all the way to the top. When you give a team a 20% sales increase target and "deliver by next quarter or be fired"...this is what you get. If the executive level dismisses a healthier, more sustainable long term growth model, then there's pretty much no way to stop this.
It's so hard to stop because it actually works. It works short term and evidence that it harms long time is typically lacking or it simply isn't true.
"If a study came out that said deafening high-pitched noises increased conversion rates, we would all be bleeding from our ears by end of business tomorrow, right?"
Netflix auto play? Is that you? You were a hateful idea, no one liked you, yet you stubbornly hung on for far too long
I'm convinced that Netflix uses "number of hours watched" as a success metric. Autoplay raises that.
I'd pay twice as much to watch half as much quality programming, but that would tank what they think is a positive metric.
I'm pretty confident that that horrorshow was a case study in A/B testing: People can't decide what to watch and agitate over flicking through titles, so some bright spark had the idea of just starting stuff in the background to see if that helped.
It did! People started watching stuff sooner!
Mostly because it was so incredibly irritating that I'd start a show just to make the random autoplay cease. Of course, what it really meant was that I'd go to amazon and agonize over what to watch, but at least I was doing it at my own speed and not being harassed by noise autoplay.
Getting a 503, so here's an archive: https://web.archive.org/web/20220712122630/https://www.zumst...
Intentional or not, one outcome on sites that are relentlessly A/B tested is that the resulting UI design lets users know that content they want is there, they just need to click and scroll a bit more to find it.
Having left FB years ago, I now watch people "navigate" their site/apps with disbelief.
Isn't that exactly the problem? The resulting UI isn't designed, it's aggregated across a disjointed set of granular tweaks.
This is part of the "unchecked" part of AB testing the headline mentions.
You, of course, need to ensure the granular tweaks can be rolled up into something usable as the granular tweaks prove successful. You can't just keep bolting on UI changes while losing sight of the larger experience. Each incremental A/B test is testing against a previously successful variant so eventually the control is radically different from where it started and you're only concerned about beating the control. Using a longer-term holdout group or reseting the control experience during incremental testing can help mitigate this and get you zoomed out a bit from the local maxima.
A problem for who? Given that people already invested in the product ecosystem seem to have almost limitless patience to scroll for the right content, I'm sure it improves almost every user time and attention metric.
It's why I saw it as my moral duty to leave (as well as the other FB properties), so that at least in a small way, I "produce content" that is only available by interacting with me as a person.
Yes, it's possible to use A/B testing in a short-sighted fashion. Yes, I'm sure there are plenty of examples thereof.
No, that doesn't mean A/B testing is inherently short-sighted. It's entirely possible to measure long-term secondary effects of an A/B test. Just save a record of treatment groups, and remember to come back and compare long-term metrics like LTV down the road. We do this all the time at my startup, and of the dark patterns that we've tested, we rarely see a long-term negative impact on LTV that outweighs the positive conversion rate impact.
If you want to make a valid argument against dark patterns (which is basically what 90% of this thread is trying to do), it's unlikely to be grounded in efficacy. This is coming from a business owner who spends seven figures monthly on advertising, constantly split tests, and is heavily invested in only making decisions that are in the long-term interest of the business.
Yes, the author was not talking about A/B testing at all, but about popups and annoying UX patterns that get rewarded by poorly used A/B tests or short-term thinking. But there is nothing stopping anybody from using long-term A/B tests (I assume most companies do, like you). Unfortunately dark patterns might simply be profitable.
AB testing shows zero respect to your customers. It is the equivalent of testing your theories on lab rats.
Instead try to improve the customer experience, make better products, improve customer service.
That's a very weak blanket statement, there are totally reasonable A/B tests you can run that don't deteriorate a user's experience, and the results can guide you to a better customer experience overall.
It did not mean it too seriously, of course there are also good AB tests, but there are a lot of bad ones out there. Those are what the article was about.
(edited for clarification)
Obviously, it depends on what the A/B test is about. Molesting or not the customer for the sake of some shortsighted metric is a bad choice; deciding what content should go above the fold or not (e.g. Amazon places images, short description, details/specs and similar products in that order) is a good choice.
>Instead try to improve the customer experience
AB testing can be (although isn't always) used to improve the customer experience. Assuming you know exactly what will make the customer experience best without actually testing it can also lead to a worse experience.
A/B testing helps you maximize a metric, not make customer experience better. Those are different things.
Better customer experiences often times lead to increased metrics. They are not totally different things.
Maybe if you are very aware of the fact that your goal and metric aren't totally aligned, but that often gets lost. As a result A/B testing for longer website visits can make websites that make it more obvious that the information you want is there, but also make the path to actually get it longer. A/B testing for engagement might promote divisive behavior and fights. A/B testing for read rate or clicks might lead to trust loss.
I think a lot of lessons from AI safety apply surprisingly well to A/B testing, mainly around how hard it is to align your actual goals with the metrics you use for optimization, and how disastrous the consequences can be. It doesn't have to go wrong, but it's incredibly hard to ensure it goes right, especially if it's the only feedback you have.
I've spent a lot of my career doing A/B testing, including doing that role exclusively for a number of years. I specialize in ecommerce, so maybe I have too narrow of a view here, but in that vast majority of cases, I am optimizing for revenue per visitor, which is a function of conversion rate and average order value. There are sometimes leading indicators like engagement, but in ecommerce, you're afforded the luxury of basing things on revenue or even bottom line.
I really don't like the positioning of ALL A/B testing as unethical behavior where you're hostilely trying to take advantage of a user. It's quite the opposite. There are a lot of extremely poor user experiences out there and a quality testing program can help improve user experiences, remove risk from making sweeping changes, and help you learn more about your audience and market.
The vast majority of the successful testing I've done is done around trying to HELP users navigate the site and product catalog, understand the product, and purchase the product. Attention spans are fleeting with online shopping and even the smallest points of confusion or friction can turn shoppers off.
Additionally, often times I'll read into test results after a month or so to see if there were any issues with orders that might indicate purchases from disinterested people or misaligned expectations.
But that's just optimizing for bad metrics. At this point, anyone who thinks "engagement" and "time spent on page" are customer-positive metrics is in a different mental space than you and I. There's a lot of ineffable things that make up good customer experience that would be hard-to-impossible to A/B test, but it doesn't mean that A/B testing is "unsafe" just because it could be used to optimize for bad things any more than any other telemetry or metrics gathering could be bad because you could optimize for evil things. And at the same time, bad management and product leadership can optimize and develop towards bad goals with plenty of tools that aren't A/B testing.
It seems to miss the point to blame/stigmatize a specific tool because it's been used poorly by a few bad actors in a public way.
I think the point is that most metrics are "bad metrics" for this purpose, as suggested by Goodheart's law.
https://en.wikipedia.org/wiki/Goodhart%27s_law
Further, I imagine that the obvious "known bad" metrics are not selected only by "A few bad apples". I think it's likely they are selected by the mass of business actors looking for current quarter results.
For sure. I don't think there's general "overall" metrics that you want to be testing against every single time on every change outside of basic performance metrics for loading or rendering in real-world environments.
I wasn't at all trying to say that only a few places are optimizing for bad things, but as you see all over this thread, there's a number of companies that immediately come to people's minds as bad actors when it comes to A/B testing - Google, Meta, Microsoft. There's plenty of other companies that are more ethical about it, or use it as part of rolling out general changes and collecting feedback. I know half of the time I log into the AWS Console it has some sort of "Hey, we're testing out a new upcoming UX for this page. Click here if you want to go back to the old one", which seems like a decent way for them to get feedback on the new designs while not drastically disrupting things.
Bad management can certainly ruin things without A/B testing.
It doesn't excuse A/B testing simply being a poor tool among all you have access to. Talking to users and stakeholders, for example, provides infinitely more input. (Edit: yeah in many cases measuring what users do, directly watching or via analytics, is also useful.)
Definitely - I'm not trying to say A/B testing is amazing, just that a lot of the comments have a strong "if you do A/B testing you're evil and are out to manipulate people" bent to them, which I think is too far in the other direction.
Talking to people is great, but getting a representative sample is hard, and often people are bad at both understanding what they want, expressing it, or even being accurate about how they use things. I know when I was working closer to the UX side of the business before, I was constantly surprised by both what users would say they want AND by how users actually used the products.
In my mind, A/B testing is good as a sort of "final pass" to serve as broad, semi-random validation that the change you're looking to make does actually do the thing that it's intended to do. It's not great for early on when you don't really know what to measure or look for, or if the change is remotely reasonable, but it can help check for if your focus group/user panel happened to be weirdly skewed in their usage/desires.
They're only different if you've selected bad metrics. If you've got two different search algorithms, running an A/B test and measuring how often the user selects the first item returned is a good measure for how well your search algorithm is returning the information the customer wanted, which is good customer experience.
They are always different. You cannot hold a conclusive A/B test for customer experience.
Search engine, a single-purpose tool, is as simple as they come regarding customer experience. Still, a good search algorithm can make me click on the first result if it is good, and a bad search algorithm can make me click the first result because they are so bad that scrolling further is a waste of time, especially if I already needed to scroll through widgets and ads to get to the first result.
It's not about just selecting good metrics, it's about higher level picture that A/B testing can never get you.
> Assuming you know exactly what will make the customer experience best without actually testing it can also lead to a worse experience.
For that you usually hire a market research company or do what they will do: take an interviewer, two cameras (one front-face, one top-hands) and hire an as-diverse-as-possible pool of test candidates that you then put through whatever workflow optimization you want to do. Then afterwards, you interview them - side benefit, you can get really interesting general side knowledge that you'd never gain from a dumbass A/B scheme: is your font style/color scheme legible, can the site be used by colorblind people, are there stock photo choices that give off stereotypical vibes...
It's real fun and a worthwhile experience for everyone involved.
It's not really an either/or option. You can use testing to validate the changes stemming from market research.
Having seen lots of site redesigns go horribly wrong due to 100% earnest people trying their best and utilizing the research that was afforded to the process, I always recommending incrementally testing into changes on high-value / high-risk applications, even when the "improvements" were backed by solid research. You never know until you release.
The "or" was meant to be the distinction on who does the user testing - I've seen both in-house testing operations and outsourced ones. For small scale operations, it may actually be cheaper to run them in-house and only hire external testers... cameras are dirt cheap these days.
Hiring a market research company is usually worth it if you have a contract with them anyway (which gives you better rates on the testing) or lack someone on staff who knows how to deal with cameras.
I have experience where the company paid a UX agency to create a flow that was by all standards better customer experience and a better product, nicer too. They ran an AB test, turns out people were more likely to pay with the old version. AB testing is good that it challenges what UX people think is better experience or product with hard metrics.
Which is why A/B testing is an important part of the UX toolkit. It's a tool among others, and is one way to validate assumptions. A good UX designer will try to base their designs on data and reasonable hypotheses drawn from the data, but a new design or flow is necessarily based on some amount of assumptions, so it requires validation.
That said, an A/B test does not tell you why something didn't work. You can make further assumptions based on the results and develop new hypotheses, but it never tells you why. Typically you would do some kind of qualitative UX research on a prototype or even static concepts beforehand to identify these kinds of issues before you even expend the effort to do a live A/B test. Far cheaper to do a study with 6-12 people and a prototype than to build out a full, functioning A/B test experience.
It's possible the flow they created was generally better but perhaps it had one fatal flaw. Perhaps that flaw could easily be remedied once identified.
A/B testing is just one small part of a good UX process.
>Instead try to improve the customer experience, make better products, improve customer service.
Without a metric to say what is "better" and a method to measure it this is empty advice.
There are many other metrics that do not involve AB testing. You can just survey customer experience before, during and after a purchase for example. I never said to throw out all metrics.
With AB testing your are optimising for a specific outcome. Usually higher conversion. As pointed out in the article eventually you'll end up with a bunch of colourful buttons and scary texts that persuade the user to click. A lot of the "only 2 seats/rooms available" are lies to scare the user into a conversion.
But does AB testing provide the only or even best metric for that? It probably is the cheapest way requiring the least engagement with the lab mice.
AB testing is how you isolate a change and measure the impact. It's the only real way to be able to associate cause and effect. Best you can do otherwise is measure something over time while making changes. You can try and correlate changes with outcomes but it's hard to be sure the change is what drove the outcome.
That sounds pretty accurate. Anyone who's gone through some econometric classes would know that split testing is (almost) the only way to get to (close to) causality - all the rest is black magic called correlation which can also lead you to the conclusion that Nicolas Cage's movies lead to people drowning:
https://www.thenationalnews.com/uae/nicolas-cage-movies-link....
https://www.wnycstudios.org/podcasts/otm/articles/spurious-c...
Maybe Apple will come up with a reality distortion field and will remove "urgency" warnings and informations from websites on Safari, as well as blocking "Join our Newsletter now and get a discount" pop-ups.
What once was ads everywhere, is now psychological gaming.
I hope someone comes up with a Google Extension, and maybe Apple with a new "Access Website" mode.
These messages are boring to be honest. Once you noticed them everywhere, game over for me. Time to move on.
For many businesses revenue is a function of aggressive deal making. Full stop. In an undifferentiated market of discretionary (impulse) purchases if you don't hustle the customer you make less. The author of this article is confusing companies that are bad at hustling with hustling being bad.
One time offers, limited time offers, mailing list signups, up-sells, and cross sells are time tested ways to increase sales dating as far back as radio era telephone and catalog sales.
Steve Madden is a perfect example of this. They sell undifferentiated popular shoe styles less expensive than high fashion but more expensive than knockoffs. They have to hustle you to get you on their mailing list (for 10% off your order) in the hopes that you'll make another impulse purchase later when you get a text or email from them. If they weren't as aggressive you might never make another impulse purchase with them again as there are tons of brands selling nearly identical products.
Some companies are just horrible at hustling so they actually get in the way of you completing your purchase. In a competitive market this is a self correcting problem.
Anecdotal: we released plenty of improved features, like a better gallery to see the items in our shop, users used it a lot +250%, but conversion rate went down 4%.
They spent more time seeing the items and..didn't like the pics and conversion went down. In the end we reverted to the crap gallery we had before, they don't click it anymore and conversion went back up again..
If it possible that there'd be a long term effect like:
* Users know you have a nice gallery
* They are more likely to shop at your store
* In the end, you get more sales despite the lower conversation rate
I agree with your point, but after finding out that in this industry you just need to be able to monkey some code to be called an engineer, random people are now data analysts because they can pull "experiment A revenue up, experiment B revenue down" and call it a day.
If you really want to see a massive amount of additional offers and small/partially hidden "no thanks" links, check out the work flow to reserve and rent a small light duty trailer with U-Haul.
You have to click through at least 10 pages of additional offers (and many extra price things that are added by default!) before you get to the actual checkout page.
As a rule, I open sites I’ve never been to in a new tab, with my hand hovering over the Cmd and W keys. The moment I get an popup of any kind that obscures content offering me to sign up for a mailing list or offer me some bullshit coupon, my fingers come down and I close the tab. I really hope the A/B tests actually show a bounce here, but I doubt it because I’m actually closing the tab, not using the back button (too many sites hijack it) and I’m not sure they can detect that I’ve left. I also do it so quickly now that they probably don’t even get the signal that I saw the popup either.
Site owners: please stop doing this. You’re turning the web into a cesspit. You’re part of the problem.
/rant
AB testing is and always has been fish oil for management. The only things it can actually prove, are more easily identifiable by common sense. So wherever it actually works, it was probably a waste of time / overkill for evidence.
- sincerely, a business analyst
Have to disagree. I've found plenty of issues that affect real production users through the use of AB testing. Problems that were small enough to escape review, testing, and reporting, but large enough to be stat-sig. They always lead to a bug, or issue with test vs control.
I will always use AB testing for uncertain code in the future. I was skeptical when I first started writing AB tests, but they have proven their worth over and over again.
Sure, but that's not really A/B testing, those are more often called staged rollouts or progressive rollouts.
I'm talking about running week or month long tests with control and multi test cells containing new functionality, configuration, or code to determine the viability of a single or combination of changes by analyzing statistical output driven by p-value and pre-determined target metrics.
These types of experiments are extremely valuable in uncovering hard-to-find bugs, assuming you have sufficient logging and confidence around your metrics. They let you know a problem exists and roughly where it is in the product. From there you can drill down and investigate your source code until the discrepancy is found.
This makes sense to me. Not the kind of AB testing I had in mind, but fair point. I was thinking more about decision making processes, not operations troubleshooting.
Yes, I know this and I agree. I understand they use the same techniques and terminologies, but this is just not the kinda thing the article is criticizing.
A nice way to summarize this article to think about local maxima and global maxima.
A/B testing right now is done on cohort basis and tests are ran for weeks to couple of months. This means where lifetime span of a customer is beyond few weeks and months, it's really not possible to tell if global maximum was missed.
I.e. you increase the number of promotional emails the customers get per week. You do it for 3 weeks and see that customers who got those emails had higher conversion. But you didn't get to see that customers who kept getting those higher number of emails completely unsubscribed after 3 months of pain. But by this time all customers are on higher frequency group so it's hard to tell what would be driving the unsubscriptions.
I'm no expert but here are some solutions:
1. You should have really delayed long running control groups. Preferably going well beyond average duration your customer sticks around. These groups should get onto new things a year after. But even then it'd be not possible to take out WHAT feature is affecting them, because in 1 year main group would have accumulated lot of features. But still something...
2. You should really have lots of secondary KPIs that measure things that affect long term KPIs. Sure conversion is better, but is time spent reading newsletters increasing? Are buyers feeling good about their experience with the brand... some of these KPI are more qualitative and can't be just automated.
what else?
In todays world of algorithms optimising marketing, and constant updates on marketing channels, it is hard to say if an A/B test worked as quality of users is never consistent.
I currently work in a game publishing company, here are 2 anecdotes from it
1. We run an A/B for game performance but we keep changing the bids for our games, and thus get varied quality of users, A/B tests don't really help in such a case 2. Once by mistake we ran the same creative on FB for 2 different ads.. both ended up having totally different metrics
> Next to some hotels, a message that supply was limited.
It's also worth noting that there's no way in hell they actually know that with any sort of precision. No GDS has proper up-to-date knowledge of bookings from all the various sources that hotel reservations actually go through (they overbook airline flights). What they're really saying is that the small inventory of rooms that are reserved for them to book exclusively are almost gone.
I think the main issue is that testing is misused to create better version of something when it should be used to create knowledge.
So if you do testing and it gives you some kind of result, the crucial step is trying to understand what it really means, is there something we can learn from it.
Unfortunately, this is also the hard part that requires actual effort and intelligence and is difficult to scale -- and so is frequently skipped.
This kind of optimisation for short term gains at the expense of long term sustainability is what is causing climate change and the collapse of the global economy. But the politicians and heads of industry who preside over this situation will all be retired/dead before it becomes a problem. Or so they thought.
archive: https://archive.ph/fuUPG
The "Hotel Tonight" example hit home with me, recently.
I used to use that app all the time, then kids happened and spontaneous hotel reservation became rare. Fast forward a few years and a circumstance came up that made me think "Hotel Tonight". I discovered it wasn't installed on my new phone so I grabbed it. It was unrecognizable. Maybe the prices were as good, maybe it could still be used the way I used it previously, but it looked like it turned into a hotel booking app when what I wanted to see was a small selection of good hotels nearby with unusually low prices. One of the features was the lack of choice.
I work in this field. Author makes great points. Here are some additional thoughts: * Don't use bad optimization to discredit all optimization * Incremental A/B testing assumes that all accumulated features are independent with no interactions, which is almost assuredly a poor assumption as a website grows and space becomes limited * A/B testing is the only reliable tool to test hypotheses; causal inference is great but not a substitute * Incremental A/B testing (small changes) should be periodically coupled with mutational A/B testing (large changes) to tunnel from local optima to global optima
Whenever someone says "X tends to be bad", people will show up to say "not all X is bad, only bad X is bad" which is really beside the point.
Such perfect timing: I just tried to place a take out lunch order with a restaurant. Opening the page popped up a modal box that said "Join Our List Subscribe to find out about new specials, community events, store openings and more." There were no buttons to click, no place to enter my email address (had I wanted to) and clicking did not dismiss it. The modal had a background that obscured the actual page.
I finally opened the inspector and deleted it, so that I could use the menu to select "order online", which took me to a page ... with the same modal.
What an excellent write up.
I agree with the sentiment on AB testing but I think the bigger insight is that we need to be reminded to see the forest for the trees with any process, tool, or goal.
Sometimes these intangibles are hard to measure and almost need to be sensed.
It reminds me of how you can see the exact same development methodology used at two different companies, where at one company it works beautifully and at the other it becomes a bureaucratic albatross.
Hrm, looks like A/B testing has destroyed the website.
A/B testing can be powerful, but you quickly lose your editorial voice and your headlines become the same clickbait garbage that works for bottom-tier blogspammers. Look at a site like The Register. Could they use A/B testing to pick headlines? If they do, it's a light usage, because the clever and witty headlines have an internal consistency that I've come to enjoy and expect.
You can get long term results from AB tests long after the test has ended...
For example, you can see if Group A or Group B from a test are more likely to still use the site 1 year later.
You hypothesize that those ways to 'juice the metrics in the short term' hurt the user experience in the long term... Well if your hypothesis is right, these long term AB results should show it.
> For example, you can see if Group A or Group B from a test are more likely to still use the site 1 year later.
This isn't very feasible on most products and certainly limited by the amount of data collected.
I think that you have to take into account the popularity of these methods when evaluating whether to implement them. It would seem the more sites that do these obtrusive UI patterns the less effective they become. Anecdotally nearly every method described in the article is an automatic back button off the site for me.
Nothing torpedoes my opinion of a brand more effectively than one of those insulting "Yes, spam me!"/"No, I'm a moron who hates saving money" popups. Absolutely mind-boggling that any thinking person thought that was an okay way to talk to customers.
The way A/B testing is practiced by actual people in the actual world, it is fundamentally broken.
No one EVER tests for mean-reversion over time.
17 years I've seen companies do A/B tests. I doubt I've seen a single convincing, durable result the whole time.
It doesn’t seem like the author has any hard data that supports his claim that long term LTV and K-factor losses outweigh short term conversion rate wins. Maybe I missed it? Without said evidence, it’s probably safe to assume his generalized claim is wrong in most cases.
Reminds me of this post: https://biggestfish.substack.com/p/data-as-placebo
Bright eyed PMs whispering "statistically significant" to themselves over and over as they nervously scan their data aggregation dashboards for wiggles.
Nobody gets a promotion for a long term oriented ab test. Hence short term is here to stay.
Saying you made a 10% purchase rate improvement in a month is an easy pay rise.
I have developed a personal strategy of ridding the Web of these things. Anytime it happens, I close the tab and move along. Very little of value is lost.
This is basically what I do. Anything that pops up or tries to grab my attention gets instant closed before I look at it and if I can't find the control to close it in 1s I just close the whole tab.
I'd say this is a sub-category of the saying: "There is nothing in this world that an MBA can't and won't make worse".
It happens when people change perspectives from building and sustaining businesses to exploiting and squeezing every employee, supplier, and customer for the last drop.
Please don't take HN threads into flamewar. The one you started here was particularly shallow and gratuitous.
This is just casual bigotry.
Calling that bigotry is offensive to anyone that has been subjected to actual bigotry. You should be ashamed.
Taking HN threads further into hellish flamewar is against the site guidelines, so please don't.
Attacking someone personally is not ok. We're trying to avoid the online callout/shaming culture here: https://hn.algolia.com/?sort=byDate&type=comment&dateRange=a...
Having a flamewar about the definition of a word is particularly pointless because different people have very different associations with the same word—especially when you remember that this place literally has commenters coming from all over the world.
Edit: you've unfortunately been breaking the site guidelines in other threads as well, e.g. here: https://news.ycombinator.com/item?id=32050947. We eventually ban accounts that do that, so please don't do that. If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
I'll say that there are managers out there who do fantastic and important work, but they are the ones that have a build & sustain perspective. If you feel offended by this comment, its because you are in the squeeze & exploit group and you know it.
Wow, this is some really strange gaslighting.
I'm an Engineer and an MBA, very proud of both designations, and proud of almost all of my colleagues in both of those fields.
I'm a better person for knowing the people I worked and studied with.
I'm not hugely offended by the OP comment (or even yours), it's not a big deal, rather I'm pointing out that it is straight up a kind of misplaced bigotry - maybe not the best word, but it's correct. It's more ridiculous than anything.
Just re-read what you wrote: "because you are in the squeeze & exploit group and you know it"
Seriously? What the F is your problem? Why would you even conceive to write that to a random commenter on HN?
Do you folks not see this weirdly dark and perverse cynicism coming out here? What's wrong with you people?
I think maybe there is an odd, intellectually lazy thing happening whereby some people, possibly lacking the understanding of a lot of the mechanics of the 'business world', and knowing that 'bad business people exist' ... just want to throw it onto 'MBAs' for some strange reason, not understanding how odd and misinformed that rationalization is. It's really weird. Guys, stop this, it's just misinformed.
Please don't feed flamewars on HN, regardless of how bad another comment is or you feel it is. It just makes everything worse.
https://news.ycombinator.com/newsguidelines.html
Edit: you broke the site guidelines particularly badly later in the thread. We ban accounts that do that, so please don't do it again. More here: https://news.ycombinator.com/item?id=32072856.
Giraffe neck is the result of a/b testing.
If you known its inside anatomy you know what I mean.
If you're upset about internet retail, I hope you're also upset about milk being in the back of the store to get you to walk through the whole thing, because this has been merchandising's bread and butter for a very long time.
This is a long-standing and oft-repeated myth.
Milk is in the back of the store because that's where it makes sense to have a refrigerated wall.
Milk is increasingly available in smaller quantities in compact refrigeration units at the front of the store.
It's probably both. But where products are placed in a store are uncontroversially designed to sell more not to make your life more convenient. https://www.npr.org/2014/08/01/337034378/everyone-goes-to-th...
Eggs and salt are a common dark pattern, milk is in the fridges with diary products in most of europe, hard to miss.