Is SVG the final frontier?

Vectorizing images is the process of taking a bunch of pixels and converting them to a shape-based representation. This has the benefit of being able to scale infinitely as we're able to render a shape at arbitrary sizes. In the context of websites it can also offer the benefit of smaller file sizes and animatable graphics.

Approximating pixels into textual representation of shapes (that are often stored as SVGs) seems like a natural task for LLMs. They are masters of token generation and are increasingly capable of performing more and more complex tasks. They are able to manipulate entire codebases while keeping internal consistency and code styling, updating unit tests as they go and implementing new features. But generating vector graphics has proven to be quite the challenge for them.

Some people, mostly jokingly, have taken it as far as claiming SVG generation is the final frontier for AI. It most likely isn't, but it's also just not (currently) very good at it. Simon Willison illustrates this with his canonical Pelicans on bicycles. Despite gaining popularity and being public on the internet for a while, AI hasn't meaningfully improved at drawing pelicans. There's some progress for sure, but you're not likely to use it in your website or your next logo. This is in stark contrast to almost every other benchmark out there that LLMs have been saturating faster than we can create them.

Where we have seen incredible progress is in raster pixel-based image generation. Major advances came with the rise of Diffusion models. In what seems like almost a polar opposite, diffusion models denoise a latent or pixel representation over many steps. We start out with noise and gradually adjust the pixels until an image emerges. Popularized with models like Stable Diffusion and FLUX, these models are now part of very capable systems like Google's Nano Banana, enabling the generation of a wide variety of images while accurately following the user's prompt and supporting edits to just the relevant areas.

With the great success of these pixel based image generation models, vector based generation has taken a back seat. That's not to say that there aren't interesting directions being explored. One approach, LLM4SVG for example aims to teach LLMs about SVGs by explicitly encoding them as semantic tokens. StarVector and OmniSVG attempt to harness the power of VLM (Vision Language Model) to support SVG natively with different nuances.

Ultimately though, these remain as research projects and haven't broadly found their way to production. Either due to extremely slow processing time or low quality results, usually both. For users looking to create vector graphics from prompts, the best results are currently achieved by using the best image models available coupled with the best image vectorizers in a two-step process. This is why we decided to focus on this problem space now.

In a sense, modern image models have made vectorization even more valuable. In the future, some vectorization techniques may be absorbed into foundation models themselves, and with enough progress, models may eventually become much better at native vector-graphics understanding and generation. But for now, for users who want to bring the power of AI to vector graphics, the most practical path remains a two-step one.