UX in the Era of Abundant Intelligence

Disclaimer: This article was written and edited without the use of LLMs.

In season one of House of Cards, Raymond Tusk, a billionaire businessman, has hyper-optimized his business such that he is stuck in his office in the following scenario:

I have eight people representing me at eight simultaneous meetings in six time zones right now. I sit here and answer their questions, provided they come in the form of a single yes or no proposition.

Raymond Tusk, the Warren Buffet of House of Cards, at peak performance

Although this is a bit of a gag (try to imagine all the ways you can write Tusk answering a phone call while keeping it interesting), it’s also a very high leverage way to work. It follows these principles:

Delegate most of the busywork / low risk decision making to others
1. Tusk doesn’t need to travel to these meetings, make small talk, or reiterate decisions that have already been made prior to the meeting. He only needs to react to new information.
When you are needed to provide leverage, get it in the way that’s most easy to act on, even if it creates extra work for subordinates.
1. Tusk has his people simplifying the questions to only have yes or no answers. This saves Tusk effort, but creates more work for his subordinates to constrain the scope of a question to match this format.

In the era of abundant intelligence, we might all have our own systems like Raymond Tusk’s. Let’s explore the pieces that are needed for this, and where we currently stand on them.

For the purposes of this article, abundant intelligence refers to a state of the LLM ecosystem where high quality output from large language models is cheap enough that cost isn’t a consideration when deciding to use them. Right now, we’re not in this state because of constrained manufacturing output from TSMC and LLM inference still having room for optimization.

In computing, we’ve progressed from punchcards to keyboard inputs to GUIs controlled by mice or touchscreens. Technically, there’s nothing we can do with a GUI that we can’t do with punchcards. But, with advances in UX, computers have been doing more and more work to make usage easier for people. Still, GUIs have a bit of a discoverability problem (you have to strike a balance between showing everything you can do and not making the UI overwhelming), and some accessibility problems - sometimes your hands are full, or you just don’t want to look at a screen.

Enter the lost decade of trying to make voice assistants work. Voice has always been a higher bandwidth form of communication than keyboard text input; it’s just that it used to be unreliable, so we slowed down communication for the sake of making things easier for computers. Siri, Google Assistant, and Alexa were supposed to be next thing, but they just weren’t smart enough. People quickly realized that 2010s natural language processing wasn’t enough to figure out what you wanted to do when you deviated from a couple set keyphrases for trigger actions. Once people lost that trust, these assistants were relegated to setting timers, asking for the weather, and turning on the lights. Amazon in particular lost tens of billions of dollars1 on its devices division, a lot of which is Alexa, with the pipe dream of people making Amazon orders on their Alexa device staying a pipe dream. In terms of synthesizing knowledge, the best that these assistants ended up doing was making a Google search and reading the results back to you. Until now.

*Siri, answering “What’s the best way to thaw frozen chicken?” on my Mac at a decidedly UNpeak level of performance.*

With speech to text models + LLMs, voice becomes a much more viable input method:

LLMs can take homonyms, verbal slips, and mistranscriptions and interpret intent and meaning out of them.
LLMs should be able to efficiently ingest the context of a discussion or person’s history and decide when it’s appropriate to use highly personal, technical or obscure vocabulary.
They can be trained to orchestrate tool calls (sending/receiving data from external sources) and integrate them into a final response.

In the era of abundant intelligence, we will use magnitudes more compute to maximize the bandwidth of communication between people and computers. The only medium with even higher bandwidth than voice would be direct neural activity, but I hope we won’t need to go that far.

Voice is looking viable enough for OpenAI to be betting on it. Per The Information, an upcoming OpenAI x io device will be screenless, always-listening, and proactive2 . Lots of effort is being put into improving OpenAI’s audio models and infrastructure to enable this device. Meanwhile, the old guard is refreshing themselves, with Alexa+, Apple Intelligence, and Gemini for Home.

Audio native models should also become smart enough one day, but it’s harder to train native audio functionality3, and they might not even be needed to unlock voice as a first-class interface. They’ll probably be better for emotional tasks or as a funnel to a more intelligent LLM though.

Raymond Tusks’s forcing of questions into only having yes/no answers is a bit too strict and is also an artifact of having to take all queries briefly over the phone. If you’re able to look at a display, you’re able to ingest data that is multimodal and more complicated. As a result, what we’re currently seeing with coding agents is multiple choice with optional free text input. Here’s a screenshot of Cursor offering choices during a planning phase:

You absolutely need to have free text input available, because if the assistant could be relied upon to have at least one of the choices be correct, you’re probably not pushing the model to the edge of its capability. On the border where the assistant is *still pretty good*, users being able to select a multiple choice option most of the time is very low friction.

Coming up with one solution that is right 60% of the time is good, but using more tokens to come up with three potential solutions, one of which is right ~80% of the time, is better. In a world of pure text, this is probably the best format for UX.

But we’re not really in a world of pure text anymore. Among the major advancements of 2025 in LLMs were how good fast LLMs were getting and how fast good LLMs got. We’re at the point where simple HTML UIs can be built on demand (ex: Gemini 2.5 Flash Lite demo at ~461 tokens/second here:

). With specialized hardware, companies like Cerebras can serve gpt-oss-120b, a decent model, at 3000 tokens per second. Anthropic and OpenAI now let you pay more to get tokens from their SOTA models faster.

Text is an efficient way for LLMs to communicate with people, but we will soon no longer need that efficiency. In the era of abundant intelligence, we can afford to burn an order of magnitude more tokens to make content more digestible for users.

While a lot of LLM output is best suited to stay plaintext, there’s a long tail of interactivity paradigms outside of plain text that would help users understand LLM output more effectively.

Reactive UIs - instead of telling a user “it depends” or making assumptions on behalf of the user, an LLM can show all of the dimensions that impact an answer, and let the user tweak them with instant feedback.
- Drag and drop
- Dropdown menus
Presentationmaxxing - think of the slickest landing page, slides, or keynote that kept you engaged and maximized the use of every single pixel on your display. There’s no reason why your LLM output can’t be as interesting.
- Multimedia - images, gifs, and animations
- Information density - tailored to individual preferences and context to emphasize what needs to be emphasized, and info dump when text can be skimmed.
- Short form video - half jesting here, but if every form of media consumption eventually evolves into a short form video (just like how many species evolve to become more crab-like), maybe the consumption of work will move in this direction too.

Google’s showcase of generative UI is excellent: https://generativeui.github.io/

Voice input will become a first-class input method when we’re able to use enough compute on it. It’s the highest bandwidth communication method, but hasn’t been good enough to trust yet. Output-wise, there’s much more applications can do visually than dump a block of formatted text. It might take significantly more compute to make output only marginally more understandable, but (1) hey, isn’t that what the multi-trillion dollar infrastructure buildout is for? And (2) marginal improvements to understanding over every step in a long chain of reasoning means massive impact.

As always, we will continue to climb levels of abstraction. I really like Cursor’s explanation of their recent computer use + cloud agent feature update:

Cursor now shows you demos, not diffs.4

Somewhere between that and “Your agent shows you money being sent to your bank account, not business ideas” is where we’re headed next.

After years (arguably decades) of stagnation, it’s exciting to be on the cusp of a genuine change in how we interact with computers. And all it’ll take is better models, better model harnesses, and a generational infrastructure buildout.