The Unreasonable Effectiveness of SDFs, Part 1

23 min read Original article ↗

Posted: 2024-10-14
Last Updated: 2025-10-09

You say "To hell with distance,
remember who you're talking to."
I say "Closeness is too much for me"
and dismiss you with a smile.

keywords: sdf, graphics

FYI: This article is part of a series; see also: Part 2, Part 3 [note 1]
This first part, though long, is mostly an introduction, so if you're already familiar with SDFs, feel free to skip ahead.

Representing Graphics

This series is about using "signed distance fields" (SDFs) for computer graphics. They're a simple and powerful concept — so good, using them almost feels like cheating, hence the title of this series. But before we dive in, let's take a step back and talk about graphics in general. Then, we'll see how SDFs fit in with those other, perhaps more familiar concepts.

Pixels

You probably know that the images on a computer screen are made of tiny little colored dots, called pixels. Ultimately, any fancy algorithms and approaches we use will result in picking colors for pixels. So, let's start there.

Why not just use pixels directly?

We certainly can! For instance, an image from a digital camera is stored as a 2D array of color values, [note 2] we could use those values to as our pixel colors directly.

But this is fairly limiting; we generally want to *do things* with images — put them at different locations, maybe swap between them to make a flipbook-style animation, and so on. We need a little more flexibility.

Space Invaders aliens, very pixelated.

Space Invaders. The original arcade game had a black&white display, with colored plastic strips placed on top.
Some good info here.

Texels, Blitting, Sprites

So, we add a layer of indirection: instead of the color data directly corresponding to pixels, we'll call it something else ("texels" is common these days), and then specify some way to pick which texel goes to which pixel. A simple choice is to give some starting corner position, and have the rectangle of pixels filled in from there.

This was how some early graphics hardware worked. For instance, see the fun article: "Street Fighter II, The World Warrier (note the mis-spelling). The process of transferring a chunk of image data from one place to another was sometimes called "block image transfer," which morphed into the terms "blit," "bit blit," "blitter" and others. [note 3]

So, now our graphics pipeline looks like this:

A transition diagram, from "position,texels" to "pixels" via the "blit" action.

With some small modifications, we can expand that to cover a lot of the capabilities of classic video game hardware, arcade systems, etc. For instance, rather than just a position, we could have a destination *rectangle*, so the image gets stretched/grown/shrunk as it is placed onscreen. We could make that "blit" operation more complex, to handle color palettes, transparency, and more. Again, Fabian Sanglard's Street Fighter II series is a fascinating reference for some real-world examples.

We start to see the beginnings of a spectrum:

The previous transition diagram is updated to have a continuum, from "simple/direct/low-level/inflexible" on the left, to "complex/indirect/high-level/flexible" on the right. Pixels are at the far left.

Ultimately, we want pixels. But in order to do more useful things, we operate at a more abstract level (towards the right), and then convert that data back to pixels via some transformation (blitting, in this case).

Aside: The term "sprite" became popular, to refer to a 2D image that would be composited into a larger scene. Apparently, the etymology is from the mythological sense of spirit/ghost, since graphical sprites seem to "float above" the background or live in a different plane of existence. I find it somehow fitting that this abstraction, a step removed from the reality of pixels, has its origins in a different sort of abstraction, and a different sense of being removed-from-reality.

3D, Meshes

Apparently, some people like 3D graphics. This puts demands on our graphics system that simple rectangle blitting cannot satisfy. [note 4] So, a more flexible abstraction was developed, with hardware support to make it practical — a "mesh" of triangles, with textures applied (and lighting, etc.). The process of coloring pixels based on the transformed, lit mesh is called "rasterization" [note 5] In broad strokes, this is basically the same concept as blitting, just "made 3D."

A 3D "assault frigate" model, from the game Homeworld (1999)

This is still a popular way to make 3D graphics today, though the transformations involved are more complicated; there can be lots of textures and complex lighting calculations, for instance. It may look fancy, but it's still typically just triangles being rasterized. In our spectrum, we'll put meshes and rasterization slightly to the right of blitting rectangles, since they're a little more flexible but broadly similar. As before, we ultimately still produce pixel data:

The transition diagram is updated again: now there is sprite→blit→pixels and mesh→rasterize→pixels, with some example images for each.

Aside: There have been several increasingly-powerful gradations of the rasterization pipeline, as evidenced by the ever-growing capabilities of GPUs. Originally, there was the fixed-function pipeline. Then, shaders, of increasing complexity. These could be seen as veering further right in our spectrum, driven by forces demanding more flexibility, indirection, and high-level control. But we lump all that together, to keep our diagram from sprawling too much.

Metaballs

We saw this push towards the right-hand side of the spectrum, because of the benefits we get. Blitting to/from rectangles rather than directly setting individual pixels allows us to think about whole objects moving onscreen. Transforming and lighting meshes allows us to deal with 3D objects — which can be rotated, deformed, animated, have shadows, etc. Can we go further to the right, to an even more abstract, but perhaps even more powerful representation?

Yes, we can!

Now, we get into the territory of so-called "implicit representations." SDFs are in this zone, but first let's briefly discuss an earlier concept: metaballs. [note 6]

The general idea behind an implicit representation is that we have a whole "field" of data throughout space (like we might imagine an electric field), with a value at each location (voltage, for instance). There is no explicit "object" or "surface" depicted by this field. Instead, if we want to draw it onscreen, we must decide how to use that field to derive visual object data. In the case of metaballs, we choose some value which defines the "skin" of our object(s), and draw that skin. That surface is where all values in the field are equal to the chosen value, and is called an isosurface. This is somewhat like picking one of the colors in a jawbreaker, or contour lines on a topographic map:

Aside: One weakness of typical metaballs implementations (which SDFs will fix) is that hard corners and sharp edges are lacking. All the shapes are "blobby." This is not a weakness of implicit representations in general.

What additional powers does this more-abstract representation grant us? The main one is that we can combine objects easily and smoothly. With meshes, if we have two of them, and we want to fuse them together, it is a complex task. There are lots of tricky computational geometry algorithms for doing this, full of corner cases and gotchas. You generally have to either pick between poor results or expensive and slow computation. It's a mess.

But with metaballs, we can very easily combine two fields. One way is summing the values — the new field equals the sum of the others. Other simple combinations, such as subtraction, or taking the minimum or maximum, also produce useful results. They are all easy to do, and result in a new field that incorporates both of the original ones. Now, instead of dealing with the mess of combining two meshes, there *are no* meshes. We do our combining in the realm of the more abstract fields, and when we're done, we can draw the "skin" of that final field.

So, how do we draw the skin? There are two main approaches: (1) ray-casting or (2) convert it to a mesh. In Blinn's paper, he uses approach (1). For each pixel, we draw an imaginary line from our eye through each pixel. We query our field along that line, to see if we hit something. When we have a hit, we can also determine the angle of impact, to do lighting and so forth, to eventually decide on the color of that pixel.

With approach (2), we simply build a traditional mesh, and then use our well-established processing pipeline [note 7] to do the rest of the work, as before. Building that mesh can be a little tricky, but there are reasonably straightforward algorithms to accomplish it, such as Marching Cubes (now patent free!). We'll call that process "tessellation." [note 8]

Now, our spectrum of options looks like this:

Another update to the transition diagram; now there is a metaballs→ray-cast→pixels entry, as well as metaballs→tessellate→mesh, in addition to what was there before. The metaballs entry is at the far-right side of the continuum.

Aside: It's a worthwhile mental exercise to consider what else might lie on this spectrum, farther to the left of pixels, and farther to the right of implicit geometry. But we leave it here, for now.

Heightmaps

One final entry, before we get to SDFs. The metaball example is perhaps a bit hard to internalize — the notion of an "electric field" or such, and how to interpret that as 3D geometry, is rather abstract. So, let's look at a more concrete example of an implicit representation: heightmaps.

As before, we do not have any explicit graphical object; we just have a field of data. But in this case, the values in the field represent simple height. Imagine some terrain from a bird's-eye view, and at each position, looking down, you record the height of the ground — that's our field.

Here, the "object skin" that you derive from the field of data is the ground's surface. This is perhaps easier to imagine than fuzzy electron clouds.

As before, the way we draw them is generally either via ray-casting or generating a mesh. However, generating a mesh is much easier in this special case, so that is a common choice. A simple naïve approach could start with a flat grid of triangles (we know how to draw those) and adjust the height of each vertex in accordance with the heightmap value at that point.

Heightmaps are a slightly weaker representation, since they describe a 3D surface using only a 2D field of data. So, they cannot represent some 3D features, such as concavities. But, they are also easier to deal with, as a result.

One final update to our diagram/continuum, to include everything:

Final diagram, with SDFs near metaballs, heightmaps slightly farther right. There is a dotted line dividing pixels, on the left, from everything else. On the left of the dotted line is "physical" and on the right is "abstract." Also, there is a grouping of the far-right items, denoted "Implicit Geometry."

SDFs, At Last

SDFs are yet another implicit representation, but with a few twists that make them unique. Firstly, the field of values represents physical distance, which makes them easier to work with than the more abstract metaballs, and enables some extra tricks. [note 9] Secondly, unlike with metaballs, it's easy to represent hard-edged or pointy objects like rectangles, cones, etc.

A collection of colored 3D objects (cones, capsules, etc.).

Each of these shapes is defined by a simple distance function (to be discussed).
via: Shadertoy

The fundamental concept of an SDF is: given some point in space, we have a function that will tell us "how far is that point from the object?" We'll describe some of those functions in a moment, but let's just say we magically have it available, for now.

This is a little bit like a bat doing echolocation — it can't see its surroundings, but it can send out a squeak, and notice how quickly that sound bounces back, to determine distance. With enough squeaks, it can build a mental image of its surroundings. Or perhaps imagine a submarine, mapping the ocean floor, sending out sonar pings in a similar fashion. In truth, echolocation gives quite a bit more data than just a single distance value, but you could think of this as a simplified version.

So, all we need is some function that looks like:

distance = f(position)
// shortened to:
d = f(p)

Thus, the function f defines a "distance field," over all the positions. And if we allow the distance to be negative when p is *inside* the object, we call it a "signed distance field" (or SDF, for short).

But surely this magical function which tells us what we want to know is vexingly complex or difficult to work with? Not so! Let's go through some examples.

Shapes

Perhaps the simplest SDF is a circle. As I'm sure you know, a circle is defined by a center point, c and a radius, r. How do we know how far p is from that circle? We just check how far it is from c, then adjust for r:

sdCircle(p, c, r) { // get distance from p to c d = length(p-c)
// shift that value by 'r', so our answer is // 0 when d==r and positive when d>r. return d - r }

Likewise, here is the formula for a box:

sdBox(p, c, radii) { // get offset from center delta = p-c // adjust for the box's radii of width and height d = abs(delta)-radii; // clamp to the nearest side or corner of the box return length(max(d,0)) + min(max(d.x,d.y),0); }

That formula is clearly a bit more complex than the circle, but still not awful. Inigo Quilez (iq) has a nice little video describing the derivation of that formula.

Skipping ahead, there are lots of nice, compact formulas for all sorts of shapes: rings, cones, triangles, prisms, capsules, and so on. The image above demonstrated many of them. Again, iq's site is a marvelous reference:

Rendering

So, we have a magical SDF function that will tell us how far we are away from a shape. How can we use that to light up pixels on the screen?

As we discussed with metaballs, the two main approaches are ray-casting and tesselation. Indeed, both of these work with SDFs, too. However, with SDFs, we can take advantage of the fact that we have *distance* rather than just some nebulous "field strength" value. A typical rendering approach for 3D SDFs uses "ray marching," which is a ray-casting approach that takes advantage of the distance values. Here's a nice video that describes the process: YouTube link

"Suppose the distance estimator tells us we're at least 5 meters away from the scene. Then we know, no matter what, we can safely march 5 meters in any direction without over-shooting. After marching, we check the distance estimator again at the new point..."

There's more you can get "for free" from the distance function, too — soft shadows, glow, etc (see the video).

But that's all 3D. And I'm making a 2D game. As you might imagine, doing 2D rendering is rather *easier* than 3D. There's no need for ray casting. Or, if you like, the "ray" being cast is just a single point — corresponding the pixel in question. Since our distance function gives positive values for "outside the object" and negative values for "inside the object", we can just set the pixel color based on this information.

With a naïve approach, this results in hard transitions between inside/outside pixels, which causes aliasing issues ("jaggies"). But since we have distance available, we know when a given pixel is near the edge, and we can soften the transition.

Note: After we get the distance value for a pixel, choosing a color is a separate step. As we'll see in Part 3, this separation is important, to improve performance.

Combining Shapes

Using plain objects (circle, square, etc.) is fine, but to build more complex shapes, we need to combine them together. Just like with metaballs, this is easily done — generally with simple operations, like addition, subtraction, min/max, etc. Some examples:

Three SDF operation examples: union, subtract, onion. There is a graph to the right of each, showing how the result is computed from the input SDF(s). Union is min(d1,d2), subtract is max(d1,-d2), and onion is abs(d) - r.

The dashed line is a representative slice through the SDF, which is graphed on the right. This shows how each operation is just a function of 1D distances.

A few things to note: first, the distance field is often "corrupted" a bit by these operations. For instance, look at the subtraction example — you can see the effects of the rectangle still present in the distance field — that is *not* what the true distance field for that shape would be (the stripey shader, courtesy of iq, is useful for noticing these issues). But for many cases, it is good enough. And in particular, the *silhouette* is preserved perfectly, even if the invisible parts of the field are contaminated.

Second, it is quite convenient to analyze operations in terms of simple 1D functions, as graphed above. I like to use Graphtoy for this (also from iq). Here's an example of the "smooth union" operator: Graphtoy link

Transforming Shapes

In our fundamental formula d = f(p), we've messed with f and d to give us Shapes and Operations, respectively. Now it's time to mess with p.

Transformations alter the position being evaluated. A simple example is to just move the position (translate by some (x,y)), but transforms can get a lot more complicated than that. Not only can we do any affine transform by doing a matrix multiply, we can also invent "warping" transformations that twist or bend space. A few examples:

Some examples of SDF transforms. The first is a box transformed into a rotated rhombus, via matrix multiply. The second is a diagonal capsule transformed into a heart-like shape, via mirror along the Y axis. The last is a box-within-box shape that becomes bowed outwards, by a "bulge" transform.

All Together Now

Thus, basic building blocks we have available are:

Three formulae: (1) Shapes producing distance values from positions [d = f(p], (2) Operations combining two shapes' distance values [d_new = f(d1,d2)], and Transforms, altering the position being evaluated [p_new = f(p)].

We can nest these together to create a complex expression for our d = f(p) equation:

An example complex formula: d = union( f_A(p), intersect( f_B(transform(pos)), f_C(transform(pos)) ) )

This functional-style notation is a little clunky, so we'll generally use this set-style notation:

The above written in set-style notation: Shape = A ∪ tform[ B ∩ C ]

Here's one final example, exercising various concepts:

A sequence of shapes and corresponding expressions, building from simple to complex. From top to bottom, they are: A (a ring), B (a filled circle), C (a horizontal line, wider at the right), [C] (C but with a wiggle added), B-[C] (the circle with the wiggly C subtracted from it), A∪([B-[C]]) (the ring unioned with the previous item, which was also moved/rotated).

Uniform Scale

One special note on uniform scaling: it's a common operation, and SDFs provide a little "trick" to make it work extra-well. As mentioned earlier, many things have a side-effect of distorting "SDF space" so it is no longer truly a distance field. And by default, scaling would do that too. But in the special case of *uniform* scaling, we can use this trick.

Consider this circle SDF, which is scaled to be larger:

A sequence of distance fields (each a simple circle), with corresponding distance graphs below them, with position on the horizontal axis, distance on the vertical axis. The first one is a small circle. The second is a scaled-up version. Its graph shows a non-45° slope. The last one has the slope corrected, so it is a large circle with a correct 1-unit-per-unit SDF.

In these graphs, the horizontal axis is position, and the vertical axis is distance (for some representative "slice" through the full 2D space).

As you can see, after the initial scaling, the slope of the line is no longer 45° — it is no longer "1 unit per unit," and thus not a "true" SDF. [note 10] But if we then also scale the distance values, we can get it back to a slope of 1. With that, if the input was a true SDF, then the uniformly-scaled output will be, too. As an equation, we might write:

A distance formula, for some scale factor "S": d = f(p·S)/S

Note how this formula involves altering both position (like a transform) and distance (like an operation). Uniform scale is not quite neatly covered by our bag of tricks. This nuance also manifests itself in iq's presentation of a "uniform scale" function: [note 11]

// What the heck is "sdf3d"?! float opScale( in vec3 p, in float s, in sdf3d primitive ) { return primitive(p/s)*s; }

Unlike most of the other functions on the page, this not valid shader code. sdf3d is a placeholder for some kind of "function pointer," but shaders don't have those. So, this is a bit of a dodge to express the concept succinctly, even though the code to actually make it happen is not so succinct.

In Part 3 of this series, we'll see how to properly express this in our final system.

Cool SDF Examples

Taking all of these concepts together — shapes, operations, transforms, and the special rendering advantages that SDFs provide — people (not me :) have created some really impressive scenes using this technology. I'll link to a few of them here, for your viewing pleasure. We'll be abandoning 3D after this, but some of these are just too good not to share.

So, What Gives?

Hopefully those examples drive home the point that SDFs (and implicit representations in general), provide genuine powers that go beyond traditional rendering methods. SDFs have been popular in the demoscene and certain niche areas of computer graphics for a long time, but are not exactly widespread. If they're so great, why don't we see them everywhere in video games?

I think it comes down to a few main factors:

  • Inertia: A huge amount of technology and training has been invested into the current status quo.
  • Performance: These fancy SDF scenes rely on very complex shaders, which are slow to evaluate. Even a powerful GPU will struggle with a high-resolution complex SDF scene, because every pixel is expensive. Like ray tracing, SDFs are used in movies, which are rendered offline, but not so much in video games, which need real-time graphics. In my game, even relatively simple 2D SDFs were too slow without some optimizations (discussed later in this series).
  • Streaming Trouble: Most of the above impressive SDF scenes are a single giant shader. A video game (or other software) will typically want to move through various scenes, perhaps in an incremental fashion. For instance, an open-world video game would want to load in new chunks of the environment as the player moves around. Most of the examples above have SDFs that are built once, not incrementally updated. The "Marble Marcher" game, side-steps this issue by making the entire world one big SDF, and having a hard loading transition when switching between maps.
  • Low Need: Some of the big advantages that SDFs provide, such as being able to combine shapes and perform interesting transforms, are not really needed by most "typical" games. Most games have fixed, pre-baked art assets; there's no need to smoothly combine things on-the-fly in real time. Games that *do* benefit from those features are mostly going to be procedurally-generated or user-generated-content games, which is a fairly niche market.

Over time, these techniques have been increasingly adopted, and I imagine that trend will continue. Also, as you can see, several of the problems relate to being used in real-time contexts. So, SDFs make more sense (and are indeed more commonly found) in "offline" contexts, such as rendering for films, and artists' modeling tools.

But what about me? I am making a procedural-world game, so SDFs felt like a perfect fit from the very beginning. I can make my own engine, so I do not have to fight any forces of inertia. That's two points down. Just streaming and performance left.

From my measurements, performance was *almost* good enough, with simple 2D objects. Once I had a lot of them, and/or they got more complex, I did need to address the issue though. However, that had a relatively straightforward solution (short answer: use caching). I'll discuss that later in the series.

No, the real deal-beaker was streaming. My game is an open world. I need to bring in new pieces of that world on-the-fly. The "everything is one giant hardcoded shader" approach, used by most of the above examples, simply would not suffice.

You might suggest: "Just re-make the giant shader when things change, adding or removing parts as needed." This works when there are a few occasional changes — indeed, that's what some of the above "3D editor" projects do. However, I wanted tens of thousands of objects, constantly coming in and out of view as the player moves. Re-compiling even a simple dynamically-generated shader (let alone a monstrous one) is not something you can do every frame in a 60Hz game.

You might suggest: "Rather than one giant shader, let each object have its own, and switch amongst them while rendering. Each one can be compiled in the background before it's needed." Unfortunately, switching amongst a thousand different shaders (corresponding to a thousand unique objects) in order to render a single frame is *also* far, far too slow. GPUs are not designed to process lots of shaders, each with a tiny amount of data. They want a few shaders, each with a huge amount of data. Every shader switch is an expensive operation.

Or you might suggest: "Evaluate the SDFs on the CPU, saving the results to a texture" (discretize the SDF field, essentially). Alas, my game, even more than most, is horrendously CPU-bound. Since it generates the world as the player moves around, it needs all the spare cycles it can get. So, I would really like to evaluate the SDFs on the GPU, which means streaming them somehow.

It seemed I was stuck.

There is a way, though (more than one, even). I consider solving this SDF streaming problem, and building the related surrounding machinery, among my primary accomplishments in this project, and they are the focus of the rest of this series.

Taking a Step Back

Before I leave you, let's recap:

  • We have three main concepts at play: shapes, operations, and transforms.
  • An object is made from some combination of those.
  • We want lots of objects, and to dynamically add/remove them in our scene.
  • We need some data-driven approach, since frequently re-building and/or swapping shaders is a no-go.

So, how do we encode those concepts as data? How do we send that data to the GPU? And what kind of shader consumes that data, calculating distance and ultimately coloring in pixels on the screen?

Those are the burning questions. See you in part 2!

Thanks for reading (:
– John


Footnotes:

  1. Sometimes, I get annoyed when I read some article that finishes with "I'll cover all this other interesting stuff in a future post!" And then I realize it's from 10 years ago, and there's no update coming...
    Don't worry, dear reader, this series is fully-formed from the start (:
  2. Okay, there is more to it — color spaces, compression, etc. But it's still a pretty direct path from stored image data to lit pixels.
  3. Interestingly (at least to me), Wikipedia only mentions "bit block transfer" (no "image"); I've always heard it the other way. There is some tangled etymology.
  4. There are some fun "fake 3D" effects that can be done with 2D sprites (eg: the bad guys in Wolfenstein or DOOM), but we're skipping along, here.
  5. Though that term is used for other things, too. Life can't be simple.
  6. A few links, for those interested:
  7. Well-established today; not so much in 1982 when Blinn's paper was published.
  8. Another term you may see is "polygonization."
  9. Though, to be fair, just about the very first thing we do with SDFs is abuse that notion so it's not strictly true (but still close enough to be useful).
  10. In a "true" SDF, distance changes by 1 unit whenever we move spatially by 1 unit — that's the definition of distance, after all. Hence the slope of 1.
  11. Search for "Scale - exact" on iq's distfunctions page.

If you like these articles, feel free to sign up for the mailing list:

We also have some social media links, if you want to follow or contact us.