Smartphone cameras and some smart glasses allow users to query AI models and receive answers about what they’re looking at. Soon, that capability could expand to other devices, including earbuds.
Researchers at the University of Washington have developed a pair of earbuds they call VueBuds that integrate a small, low-resolution camera into each earbud. The prototype earbuds have features similar to those of smart glasses, like the Ray-Ban Meta glasses—things like translating signs in foreign languages, acting as an aid for low-vision wearers, or identifying plant species during a hike.
Smart glasses have their drawbacks, including privacy concerns and comfort. The under-the-radar cameras have faced criticism and lawsuits over concerns they can record unsuspecting bystanders and what ultimately happens to sensitive visual data.
And not everyone likes wearing glasses—some even opt for contact lenses to avoid having to wear them, including Shyam Gollakota, the University of Washington professor who led the VueBuds research. “The one predominant wearable which almost everyone wears is your earbuds,” he says. His team presents earbuds as an alternative to smart glasses that’s less intrusive and better for privacy.
The primary goal of the research, however, was to demonstrate that this small, ear-worn form factor is even possible. “Traditionally, earbuds have been limited to audio interfaces,” Gollakota says. “We show that we can indeed build a system within that form factor and get lots of intelligence by running visual language models.”
The research was presented today at the ACM Computer-Human Interaction conference in Barcelona.
Why Earbuds Are an Ideal Smart Device
Gollakota and his colleagues don’t expect VueBuds to be the only interface for visual AI.
“Wearables are very personal,” says Maruchi Kim, a Ph.D. student in Gollakota’s lab. Some people may prefer glasses or watches, others might like rings, and so Kim suspects there won’t be one device to rule them all. “We’re just trying to introduce another category to demonstrate that everything smart glasses do can be achieved on [earbuds].”
That said, the interface may have some advantages. Because they’re already widely used, people may be more likely to adopt the technology. Plus, Kim says, “there’s already a social paradigm for putting your earbuds away in their case.” Smart glasses may have prescription lenses, so the wearer would keep them on all the time. But “if you ever want to be confident that these cameras aren’t recording, earbuds are a nice form factor that lets you just tuck it away when you’re ready.”
Many of the AI features users indicate an interest in are also “episodic use cases,” Kim says. To translate a street sign or ingredients on a package, for instance, you don’t need a continuous video stream.
Key Challenges for Earbuds With Cameras
There are three key challenges to making vision-capable earbuds possible, Gollakota says: Fitting the camera within strict size, power, and weight constraints; transmitting the data; and creating a complete visual scene when worn in the ears.
Cameras typically take a lot of power, making this the number-one concern. “The batteries in your earbuds are about 10 times as small as what you have on smart glasses,” Kim says. Visual data also requires much higher bandwidth than audio, so the videos recorded by glasses are typically sent via Wi-Fi to be processed by cloud-based AI models. Wi-Fi allows for high bandwidth—but takes more power.
VueBuds transmit low-resolution, grayscale images over Bluetooth. Most device makers try to transmit as much data as possible, but Gollakota’s team took a different approach. They wanted to see what the lowest resolution a visual language model would need to extract useful information, opting for a 324-by-324-pixel image sensor.
Integrated black-and-white cameras and a Bluetooth connection to a phone-based visual AI model make these earbuds an alternative to smart glasses.
Beyond the power and bandwidth concerns, the researchers also had to make sure earbud cameras could see enough. Placing cameras at the ears creates a blind spot on either side where the face blocks each camera’s view. But by setting the cameras at a slight angle (5 or 10 degrees) away from the face and stitching together images, the team found they could reconstruct a more complete scene with a wide field of view. This does, however, create a small blind spot for objects closer than about 20 centimeters from the face directly in front of the user.
The researchers tested the earbuds with four different visual language models. In user studies with the best-performing model (Qwen2.5-VL), VueBuds achieved about 82 percent accuracy for object recognition, 94 percent for character recognition, 84 percent for translation, and 87 percent overall accuracy in user studies. The earbuds performed comparably to Ray-Ban Meta glasses across 17 tasks.
In the future, the team hopes to add color to the system. Kim is also looking into improving the resolution possible by incorporating an on-device JPEG encoder, which would significantly reduce the size of images sent to be processed.
Privacy Concerns for Smart Earbuds
Many users have been wary of privacy and surveillance concerns with smart glasses. Those worries are intensifying with new evidence that the companies building these glasses may be mishandling the data they capture.
Given those concerns, should we add cameras to yet another wearable device? The University of Washington researchers say VueBuds’ stripped-down image capture is a boon for privacy compared to today’s smart glasses.
For one thing, the system is designed to run on a smartphone or other local device, so data never goes to the cloud, Gollakota says. VueBuds also only captures still images. One of the main uses of Meta’s smart glasses is now recording video, but, he adds, “no one wants to see a low-resolution grayscale video in the first place.”
Additionally, VueBuds are activated by voice commands. “That audio initiation means that everyone around you would know what you’re actually asking.” Smart glasses, meanwhile, can start recording with the touch of a button.
Gollakota notes that most people have become accustomed to having microphones in nearly every device, because they provide enough utility through capabilities like voice commands and “a trust has been built” with companies, like Apple, that sell devices with built-in microphones. Whether the same paradigm will emerge for visual intelligence remains to be seen with how the technology—and our level of trust in it—evolves over the next few years.
Apple is also rumored to be developing next-generation AirPods that integrate infrared cameras to enable gesture recognition and improve spatial audio. These wouldn’t have the visual intelligence capabilities made possible with standard cameras, but it would indicate more interest in expanding the capabilities of what has traditionally been an audio-only interface.
Earbuds are “the most successful wearable we have today, and right now it’s limited to being an audio interface,” Gollakota says. “Bringing visual intelligence would make it a much richer and more powerful interface than what it currently is.”