Bolt Graphics Targets FP64 HPC Workloads with Zeus GPU - HPCwire

5 min read Original article ↗

A Silicon Valley chip startup named Bolt Graphics has completed tape-out of a test chip for Zeus, a new RISC-V GPU designed to address HPC, rendering, and other compute-intensive applications. The company says it’s on track to begin deliveries of Zeus, which will deliver 20 teraflops of FP64 capacity on a single motherboard, by the fourth quarter of 2027.

Darwesh Singh founded Bolt Graphics in 2020 with a goal to deliver a chip that can power heavy duty applications, like simulations and three-dimensional graphics, used by game designers, artists, scientists, and engineers. While today’s GPUs are powerful, they were built to solve problems from decades ago, and aren’t addressing the computing challenges that the graphics and simulation communities are facing today.

(Image courtesy Bolt Graphics)

“Clearly the problems in the ‘90s that GPUs were initially designed to solve are not the problems of today,” Singh said in a video posted to his company’s website. “A new type of GPU, something entirely groundbreaking, is needed to power the next 30 year of computer graphics and power the next generation of use cases.”

Singh said Bolt Graphics is taking a no-compromises approach to solving challenging rendering problems. For instance, modern GPUs can’t efficiently deliver the rasterization and ray tracing that video game designers and animation creators demand, he said. It can take hours to fully render short animated clips at 4K resolution using 120 fps, and it can take years to complete full-length animated films.

“With Zeus, we’re leapfrogging both rasterization and ray tracing to bring real-time path tracing,” Singh said. “Path tracing is the most advanced rendering technique providing the highest quality visual.”

With up to 384GB of expandable DDR5 memory, Zeus can also help researchers run larger simulations at full FP64 accuracy. The chip runs electromagnetic simulations 300x faster than the Nvidia Blackwell B200 with IEEE-754 FP64 accuracy, Bolt Graphics claims.

Bolt Graphics is delivering 20 teraflops of FP64 with its Zeus 4c offering (Image courtesy Bolt Graphics)

“Zeus is multiple orders of magnitude faster than legacy GPUs in performing these key physics simulations without trading out performance for accuracy,” Singh said. “In fact, every Zeus GPU, whether consumer or enterprise, has full FP64 cores designed to efficiently run HPC workloads.”

The GPU design for which Bolt Graphics just finished tape-out uses established semiconductor processes, including TSMC’s 12nm FinFET Compact (12 FFC) process. Bolt Graphics says Zeus’s scalable architecture also addresses advanced nodes, including 5 nm. The chip includes scalar cores, vector cores, and other specialized processors. The company is packaging its GPUs using one, two, or four chiplets per board for the Zeus 1c, Zeus 2c, and Zeus 4c offerings. While Zeus 1c and 2c fit on a PCIe card, Zeus 4c is too big and requires a full motherboard.

According to Bolt Graphics’ Zeus Spec Sheet, the high-end Zeus 4c will deliver 20 teraflops of vector FP64 capacity while consuming 500w of power. Customers will be able to put dozens of Zeus 4cs into a single server, addressing up to 9 TB of memory in a scale-up configuration. Zeus cards will include a 400 GbE interface (optionally 800 GbE), enabling customers to build scale-out clusters composed of thousands of GPUs, Bolt Graphics says on its website.

The FP64 capacity of the high-end Zeus 4c offering is within the ballpark of Nvidia Hopper H100 and H200 GPUs, which delivered 34 teraflops of FP64 within a similar power envelope (about 350 watts). With its Blackwell B100 and B200 GPUs, Nvidia delivered 30 teraflops and 37 teraflops of FP64 capacity, respectfully, but the power demand essentially doubled to 700 watts. Nvidia’s new Rubin GPU will deliver 33 teraflops of FP64 capacity while consuming up to 2,300 watts per GPU.

Obviously, the newer GPUs from Nvidia have oodles of AI capacity, which generally runs at lower 4-bit and 8-bit precisions. Nvidia is counting on the Ozaki emulation scheme to deliver FP64-like math capabilities using lower precision cores. However, not everyone is happy with Ozaki, and this has led to some concerns in the HPC community that native vector FP64 capacity needed for traditional modeing and simulation workloads is being sacrificed to bolster AI capacity.

The Zeus Spec Sheet (Image courtesy Bolt Graphics)

This concern over native FP64 capacity is something that AMD is addressing with its upcoming MI430X GPU, which the Department of Energy will be using for the upcoming Discovery supercomputer to be installed at Oak Ridge National Lab in 2028. The MI430X likely will have around 200 teraflops of FP64, according to estimates.

“Compute demand is growing exponentially, but cost remains the limiting factor,” Singh, who is also CTO and CEO of the Sunnyvale, California-based company, said in a press release. “We believe the next generation of computing will be defined not just by performance but by efficiency. Our goal is to fundamentally change the economics of compute and become the default platform for next-generation workloads.”

Bolt Graphics said it has a product pipeline exceeding $500 million and over 14,000 members in its early access program, including enterprises, developers, and end users. It’s not clear if government labs that are hungry for FP64 capacity are part of this program, but it wouldn’t be surprising if they were. For more information, see the company’s website at https://bolt.graphics.