Tenstorrent Previews Large Compute Cluster, Generates Video Faster Than Real Time

Ahead of a full launch of Tenstorrent’s new generation of cluster-scale systems next week, EE Times got a first look at the company’s video generation demo. The Wan2.2-14B-based demo generated a five-second video (720p, 81 frames, 40 steps, based on an optimized model) from a text prompt in three seconds when EE Times tried it out. The company said its record of 2.4 seconds is around 10× faster than the same production-grade model running on other leading hardware.

The demo runs an optimized version of Wan2.2-14B created by partner Prodia, a fast image and video generation AI cloud company. It runs on four new BlackHole-generation Galaxy servers (128 Tenstorrent BlackHole chips).

A bar chart comparing AI video generation latency for a 5-second, 720p video. Tenstorrent x Prodia (Wan 2.2-14B) shows a record low of 2.4 seconds, nearly 10x faster than Nvidia x Prodia (23.2 seconds) and xAI’s grok-imagine-video (14.8 seconds). — Tenstorrent, with partner Prodia, has sped up video generation by a factor of ten compared to alternatives. (Source: Tenstorrent)

Tenstorrent’s hardware architecture for AI unifies compute, memory, and networking to enable very large systems to run a single software program. This is achieved without proprietary interconnects or reconfiguration, and isn’t workload-specific.

While there’s a lot of focus on code generation today, video generation will take off imminently and drive entire markets, Tenstorrent senior fellow Jasmina Vasiljevic told EE Times.

Agentic AI Tackles RTL Verification’s Productivity Gap

By Harry Foster, Siemens EDA 05.04.2026

Industrial Control Systems Manufacturer Eliminates Stockouts

By MRPeasy 05.04.2026

GUC and Wiwynn Collaborate on Silicon-to-System Infrastructure for Next-Gen Hyperscale AI

By Global Unichip Corp. (GUC) 05.04.2026

“We’re pretty excited about video,” Vasiljevic said. “We see that with more Galaxies, we can tackle bigger resolutions and longer videos.”

One of the reasons video applications had stalled is that they required too much compute, she said.

“It used to take many minutes to generate a single video; now we can produce five seconds of video in two and a half seconds, which is better than real time,” she said. “Breaking the real-time barrier will unlock new verticals.”

A big bottleneck for video generation inference today is iterative denoising, which can be parallelized across frames but is ultimately a sequential process. New research is exploring autoregressive or predictive models that predict future tokens or frames similarly to how text LLMs make predictions, which is less sequential in nature. Tenstorrent sees the future of video generation as a combination of both approaches.

“The likelihood of these algorithms evolving aggressively to combine is very high,” Vasiljevic said. “We’re excited because [Tenstorrent] is good at both.”

A five-second video generated in under three seconds by the Prodia model running on Tenstorrent hardware, based on a prompt supplied by EE Times (Source: Tenstorrent)

Video customers

Tenstorrent CEO Jim Keller told EE Times that the company is working with some video-focused customers, including some of the big frontier AI labs.

“We’re pretty sure we’re the only ones who’ve been able to do [generation of video] at this kind of speed, and at these cost numbers,” he said. “Our mission right now is just to keep making it easier to do and easier to deploy, but the actual models run pretty solidly and are stable.”

Many video customers are running minor tweaks or iterations of core models, he said, which can run on top of the work Tenstorrent has already done on models such as Wan2.2-14B.

“The engine underneath is pretty common, which is cool because it makes porting relatively straightforward,” he said. “The big diffusion models have been around for a while, and everybody knows how they work, but running them across 128 chips in parallel at high utilization is new.”

Common ground across today’s video generation models means customers can get up and running quickly, especially given Tenstorrent’s fully open-source software stack. But the real aim is for the architecture to be general-purpose enough to handle the next generation of models, too, Keller said.

“Our mission has always been general-purpose AI computing… we’re anti-specialized,” he said. “With a lot of chips, we have a lot of SRAM, but all the chips also have DRAM, and they are highly networked together, so our platform is more general-purpose. The world is talking about specialized, specialized, specialized, and that should be terrifying [everyone] because as models change, that specialized hardware is not going to work.”

Benchmarks for other workloads are expected at Tenstorrent’s at-scale compute launch next week.

We updated the article on 22/4/26 after Tenstorrent contacted us to clarify the amount of chips needed to run the video generation demo.

Video customers

See also: