Honeydiff: Fast, Rich Image Diffing for Modern Visual Testing

9 min read Original article ↗

When I started building Vizzly, I knew visual regression testing needed to feel different. Not just another bolt-on testing tool, but something integrated into the development workflow. The kind of thing you’d actually want to use during local TDD, not just in CI.

One of the most powerful features a visual testing tool can offer is handling live data. Real applications aren’t static — content changes, users interact, and APIs return new results every time. The best tools can tell the difference between expected data changes and real visual bugs, keeping tests stable without resorting to mock fixtures or frozen snapshots.

(Random aside, LLMs will take the em dash from my cold dead hands 😆)

Initially, I used odiff for Vizzly’s image comparisons. It’s an excellent tool. Fast, accurate, and well-designed. But as I built out the visual testing workflow, I needed more than “different” or “same.” I needed spatial clustering to know where changes are. Intensity statistics to understand the magnitude. SSIM (Structural Similarity Index) scoring for smarter live data testing. And I needed it fast enough for local TDD workflows.

Building these capabilities directly into the diffing engine made the whole system better. The diff UI could show you exactly where changes clustered. The comparison logic could use perceptual scoring to be smarter about what matters. The performance meant you could actually use it in local workflows.

So I built Honeydiff.

What Makes Honeydiff Special

Let me break down what makes Honeydiff different, and why it’s the foundation we needed for Vizzly’s visual development workflow.

Performance Built for Real-Time Workflows

Honeydiff is fast. Really fast. In benchmarks against odiff 4.1.1 — the Zig rewrite with SIMD optimizations — Honeydiff is 9-16x faster on real-world screenshots. Full HD images process in 15ms vs 240ms. Tall dashboard screenshots (2.5 million pixels) complete in 20ms vs 240ms. Even massive 18-million-pixel scrollable pages finish in 80ms vs 710ms.

Why does this matter? When you’re running vizzly tdd locally and making visual changes, you need instant feedback. Right now. Fast enough that it feels like part of your development process, not a separate testing step. Processing a massive 18-million-pixel comparison in 80 milliseconds means you’re never waiting on the diff engine.

I wrote it in Rust and leveraged Rayon for parallel processing. Every row of pixels gets compared in parallel across your CPU cores, achieving 260-309% CPU usage (effectively using 2-3 cores). Even with anti-aliasing enabled, the overhead is only 10% slower while reducing false positives by 4x.

What’s interesting is that the comparison itself is incredibly fast — most of the time in real-world usage is PNG encoding when generating diff artifacts. The actual pixel comparison for an 18-million-pixel image? 80 milliseconds. That’s the kind of performance that makes rich metrics like SSIM and spatial clustering viable in local development workflows, not just slow batch processing jobs.

Accurate Anti-Aliasing Detection

Honeydiff uses odiff’s anti-aliasing detection algorithm but with a more conservative approach. In testing on real screenshots, Honeydiff’s AA detection is more selective - it catches more real visual differences while still filtering out rendering artifacts like font smoothing and edge anti-aliasing.

The algorithm checks if a pixel has similar neighbors in both images (using 8-connectivity). If it does, it’s likely an anti-aliasing artifact from font rendering or edge smoothing. If it doesn’t, it’s a real visual change worth flagging.

This conservative approach is better for visual regression testing. In benchmarks on real Vizzly screenshots, Honeydiff catches 19% more real differences than odiff’s AA detection. Where odiff filtered 4,802 pixels as AA artifacts, Honeydiff only filtered 3,483 — catching 1,319 additional real changes that would have been silently ignored. You want to catch legitimate UI changes, not just ignore everything that might be AA.

Rich Metrics That Enable Better Features

Here’s what gets me excited about Honeydiff: it’s not just about “same” or “different.” It gives you rich data about visual changes that enables way better features and workflows.

Spatial Clustering - Uses 8-connectivity connected components labeling with a two-pass algorithm to identify distinct regions of change. Instead of saying “10,000 pixels changed,” you get “5 separate regions changed” with each region’s pixel count and center of mass. This makes diff visualizations way more useful - you can see distinct regions of change with their exact locations, not just a sea of red pixels. The clustering data also enables smarter diff overlays and better visual analysis tools.

Intensity Statistics - Min, max, mean, median, and standard deviation of diff intensities. This tells you the story of what changed. Is it one bright pixel that’s way off, or is it a subtle color shift across a large section? When you’re testing with live data and content changes slightly, intensity stats help you understand if the visual structure is still intact.

SSIM Perceptual Scoring - Structural Similarity Index measures how similar images are from a human perception standpoint (0.0-1.0 score). This is expensive to calculate, but it’s incredibly valuable for visual testing with live data. Content might change, but if the SSIM score is still high, you know the UI structure is preserved. This makes it possible to test real applications with dynamic content instead of requiring static fixtures.

Bounding Boxes - Get the exact rectangular region containing all changes. Perfect for cropping diff views or highlighting exactly where attention is needed. The diff UI can focus on what actually changed instead of showing you the entire screenshot.

Flexible Screenshot Dimensions - Compare screenshots with the same width but different heights. Real web applications scroll. Mobile apps scroll. Honeydiff handles variable-height comparisons natively - overlapping regions get compared pixel-by-pixel, extra rows are counted as differences with proper visual highlighting.

You can enable just what you need. SSIM is expensive? Don’t include it. Need clustering for diff analysis? Turn it on. The library adapts to your use case, and the modular architecture means I can keep adding more sophisticated analysis as visual testing evolves.

Two Color Spaces, One Library

Different projects need different approaches to color comparison:

  • RGB mode (default): Exact, pixel-perfect matching. Fast. Use pixel_tolerance (0-255) for fine control.
  • YIQ mode: Perceptual matching weighted for human vision. Use color_threshold for more forgiving comparisons.

YIQ is what odiff uses by default (threshold of 0.1), and Honeydiff implements the exact same color space conversion. RGB mode is faster and great for when you need precise control.

A Fresh Direction for Visual Testing

Visual testing has been stuck in the same patterns for years. Screenshot comparison tools give you a simple “different” or “same” answer. Maybe a diff percentage. That’s it.

But building visual quality into the development workflow needs so much more. When you’re doing local TDD with vizzly tdd, you need instant feedback. When you’re collaborating with your team on visual changes, you need rich data about what changed and where. When you’re building sophisticated review workflows, you need the foundation to support them.

That’s why I built Honeydiff the way I did. Fast enough for real-time workflows. Rich enough to power advanced features. Flexible enough to evolve as visual testing needs grow.

Right now it powers our visual development workflow - from local TDD to team collaboration. But I’m just getting started:

Smarter diff visualizations: The rich clustering and intensity data enables better diff overlays - ones that are more readable and help you understand changes at a glance instead of just highlighting pixels.

Better live data testing: SSIM scoring and intensity statistics are the foundation for smarter comparisons that can tell the difference between “the content changed” and “the UI broke” when testing with dynamic data.

Tailored for real workflows: Because I built this for Vizzly’s specific needs, I can keep adding features that make visual testing better for actual development workflows, not just theoretical use cases.

More rich metrics: The modular architecture means I can keep experimenting with new algorithms and analysis techniques. More data about what changed, better ways to understand visual differences, smarter ways to filter noise from signal.

Visual testing needed a fresh foundation built for where it’s going, not where it’s been. Something fast enough for development workflows. Smart enough for sophisticated analysis. Flexible enough to keep improving.

Experience It in Vizzly

Want to see instant visual feedback in a real workflow? Honeydiff powers Vizzly’s visual development platform.

Local TDD Mode - Run vizzly tdd while you’re coding and see visual changes instantly. Processing 18-million-pixel comparisons in under 100 milliseconds means instant visual feedback that never keeps you waiting.

Team Collaboration - When your CI creates builds automatically, Honeydiff’s rich diff data helps teams understand exactly what changed. Spatial clustering, intensity statistics, and perceptual scoring make visual review more informative.

Live Data Testing - SSIM scoring and smart metrics mean you can test with real data instead of static fixtures. The engine can tell the difference between content changes and actual visual bugs.

Sign up for Vizzly and experience visual quality integrated into your development workflow. Your future self will thank you for catching visual issues while you’re still in the flow of coding, not days later in a separate testing phase.

What’s Next

I’m just getting started. Honeydiff powers Vizzly today, and I’m excited about where this foundation can take visual testing.

The world needs better visual quality tooling. Tools that integrate into development workflows instead of sitting on the sidelines. Tools fast enough for TDD and accurate enough for production. Tools that provide rich data instead of simple pass/fail results.

Honeydiff is our foundation for making that happen. Visual testing needed a fresh direction, and I’m building it.


Honeydiff powers Vizzly’s visual development workflow platform. Learn more about local TDD mode and team collaboration features.