Beachheads and Obstacles
stratechery.comThis article is written as if we don't already know if Alexa (and its competitors) are the next big computing platform. We already know it isn't.
Alexa first came out in 2014. If it was the world changing product that people said it was going to be it would have spawned multiple billion dollar companies by now. Much like smartphones led to Uber.
Amazon can keep pooping out Alexa into 15 new products every year. Google will copy them because they don't actually know what a good product is but they're the internet company version of Samsung, a fast follower, so they'll do it because they're competitor focused and just in case it takes off.
None of that will change that this product category isn't going anywhere.
In a website or an app, there are specific affordances i.e buttons, dropdowns, gps and text boxes that bound the input and steer the user input to help the user achieve the task.
For the speakers like Alexa and Google Home, voice being the only input allows user to say whatever they want hence making the task space infinite. But the voice recognition and NLP is not in a place where it can recognize everything the user has said. This creates a less than stellar experience with the user having to repeat, rephrase or even worse abandon the task. I think this platform will blow up when NLP/AI is able to detect user intent with near perfect accuracy and is able to make the interaction with the user as fluid as with a well designed app. It doesn't hurt for Amazon to have a large installed base ready to use the platform if/when intent recognition becomes par.
Of course it will never replace phone/desktop as there will be things which we cannot say over voice (secrets) and where it is not possible (loud places) or just not courteous behavior.
> This creates a less than stellar experience with the user having to repeat, rephrase or even worse abandon the task.
Not to mention: constant wondering whether the task can even be accomplished. When a voice assistant rejects your query, in many cases you can't be sure whether it's because it couldn't understand you, or because it can't possibly accept what you said as a valid input in the context it's in. In regular interfaces, visible constraints matter as much as affordances.
Norman would refer to these constraints as "signifiers", indicators of possible affordances. It's interesting how weak voice assistants are at signifying what you can actually do with them.
Thanks for introducing me to the term. Damn, I need to finally read that book.
The dev team can add helpful responses that signify to users the available set of voice commands for tasks it can complete based on keywords it can recognize from a user utterance or simply letting them know they didn't understand their response and they can get a list of actions spoken to them by asking for help. (I've worked on published Alexa skills for several large tech companies.)
I think a cool immersive middle ground will be smart surfaces embedded in wall materials that can display things and will simply list out all actions available or anthropomorphize the smart assistant as like a virtual servant that follows you around serving up facts and doing monotonous IoT actions for you.
Now the privacy and surveillance implications of something like this is another story...
> Now the privacy and surveillance implications of something like this is another story...
Those would be resolved here and elsewhere if the industry could be made to stop trying to own people's data. It's not the data that should be a commodity, it's software and clouds.
The iPhone was released in 2007, but a smartphone that could technically have called a ride predates it by at least a few years. Uber was founded in 2009, but actually release the product and app until 2011, after a beta in 2010 [0]. It didn't hit a $1B valuation until somewhere between February 2012 and August 2013 [1]. So 4 years until the emergence of the X in "smartphones led to X", 6 years until it was clear that X was a $1B opportunity, so unless you're completely aware of every product spawned by Alexa or voice assistants generally, I don't think this is a fair comparison.
Even saying that, Alexa and competitors don't need to be computing platforms, they just need to be the interaction layer for day-to-day human-computer interaction. Even if Alexa were as technically simple as an IFFTT recipe with the "THIS" being "what the person says out loud", that's hugely valuable. It's very clear to me that if it's easier to do something with on device than without that device, and even easier to make a device do a thing without physically touching the device, that will ultimately be the way most people do the thing, provided they have access to the technology.
That is why Amazon is pumping out so many different Alexa form factors. I couldn't figure out who would want the Alexa ring. Then I met someone who hates ear buds and swears by his Swiss watch who was excited about a wearable he'd use. Amazon wants to be the dominant player, and is simply crossing form factors off of their (presumably) stack ranked by opportunity size list.
[0] https://en.wikipedia.org/wiki/Uber#History
[1] https://pitchbook.com/news/articles/uber-by-the-numbers-a-ti...
> The iPhone was released in 2007, but a smartphone that could technically have called a ride predates it by at least a few years.
From a technical perspective, there's no reason you couldn't have had a ride-hailing app on a Nokia Series 60 phone in 2006, or a Palm Treo 270 in 2002. All the necessary components were present in both of them. What was missing was a critical mass of users of those devices -- a user base large enough for third parties to justify building services on that scale for them.
Which could help explain why Amazon is pumping out Alexa devices in so many different form factors; they haven't cracked the nut yet on a single form factor that will attract that critical mass of users, but maybe if you can aggregate enough small user bases together they'll add up to a large enough one to attract third parties.
Yes, I completely agree with that. Particularly given that "voice assistant who helps you rely less on your smartphone OS" doesn't have the built in distribution advantage that phones did (pretty much everyone had accepted that they need a mobile phone), so a diverse set of form factors matters (not everyone wants glasses, some people hate watches, etc).
A few months ago, I was debating with a friend about the next major computing interface.
Is it Voice or Spacial Computing (AR/VR/xR)?
It's clearly voice, for many reasons. To cut the argument short, realize that language is what separates humans from other animals, and recognize that voice is the natural form of language, not writing. The advancement of AI requires major advancement in understanding human emotions, which are conveyed through subtleties in voice, and not picked up in text.
But it doesn't need to be either/or. Why not both Voice and xR?
Over time, voice and xR will converge, as voice interfaces get integrated into more consumer services, and xR gets more productive applications (right now it's all games and porn). But by then, Amazon will be well-positioned to get into the fun, but Facebook will not be taken seriously.
Unless Facebook can disrupt the global monetary system, pushing Libra by providing a discount for all purchases made within Oculus.
I know the Portal wasn't exactly a huge hit in terms of sales (though the reviews of the product functionality alone were fairly positive IIRC), but it's pretty clear evidence that Facebook is not ignoring voice as a piece of its xR portfolio. Oculus is pretty clearly the leader in consumer VR at the moment, and Portal sits within Boz's AR/VR org. I wouldn't count them out based on their ability to ship products quite yet. So unless you think people will avoid anything made by Facebook as a rule (possible, though I don't think there's much evidence of it at the moment), I still think they're in about as good position as they could hope for re: being the xR platform.
That Oculus is the leader in consumer VR (games and porn) does little to spark confidence in Facebook as a hard product design firm. Oculus was an acquisition, and we know Facebook is good at acquiring amazing companies.
But can Facebook create innovative products that people want? This is still an open question. After all, Facebook's success comes from out-executing on the ideas of others.
I haven't used a Portal, but isn't it video-focused? And doesn't it integrate Alexa?
I'm sure Facebook is aggressively trying to build a competing voice interface, but it's far behind Google, Amazon, and Apple. It will be harder to out-execute Amazon than Snapchat. And Facebook will need to improve its branding to even have a chance.
Oculus was acquired long enough ago to attribute the product success of pretty much all of the consumer success to Facebook's ability to ship. The Quest was designed and shipped entirely under Boz, for example. They didn't buy Oculus and simply stop working.
Facebook deserves credit for iterating on solid foundations.
If they make VR useful outside of games and porn, then they will deserve credit for meaningful innovation.
To clarify why FB cannot be satisfied with the games and porn market is that theres a big gender imbalance in those markets, and it runs against the demographics of Facebooks other apps, which have equal or even female-majority user-bases.
Given its PR and branding issues, Facebook does not want to be a games and porn company (techbro) company.