Settings

Theme

Ubershaders: A Ridiculous Solution to an Impossible Problem (2017)

dolphin-emu.org

222 points by Grognak 2 years ago · 71 comments

Reader

phire 2 years ago

Has it really been 9 years since I started working on Ubershaders?

I'm a little surprised no better solution has come along. Vulkan didn't even exist back then (and DirectX 12 had only just released) but instead of making things better, it digs it's feet even deeper into the assumption that all shaders will be known ahead of time (resulting in long "shader recompilation" dialogs on startup on many games).

I've been tempted to build my own fast shader compiler into Dolphin for many common GPU architectures. Hell, it wouldn't even be a proper compiler, more of a templated emitter as all shaders fit a pattern. Register allocation and scheduling could all be pre-calculated.

But that would be even more insane than ubershaders, as it would be one backend per gpu arch. And some drivers (like Nvidia) don't provide a way to inject pre-compiled shader binaries.

On the positive side, ubershaders do solve the problem, and modern GPU drivers do a much better job at accepting ubershaders than they did 9 years ago. Though that's primarily because (as far as I'm aware) examples of Dolphin's ubershader have made their way into every single shader compiler test suite.

  • Akronymus 2 years ago

    >On the positive side, ubershaders do solve the problem, and modern GPU drivers do a much better job at accepting ubershaders than they did 9 years ago. Though that's primarily because (as far as I'm aware) examples of Dolphin's ubershader have made their way into every single shader compiler test suite.

    How'd that come to be? Just interesting code for test suites or did you guys advocate for it to be included?

  • Miksel12 2 years ago

    Don't you think intermediate representation like SPIR-V would suffice in mostly eliminating stutter? Yuzu used that and shader stutter seemed to be minimal and I can image that the shaders generated by Yuzu are much more complex than Dolphin.

    • phire 2 years ago

      The only step that SPIR-V replaces is parsing the GLSL to an AST tree, and that's only a small part of the total time to compile a shader. Usually the bottleneck is Register allocation or scheduling.

      Back when Vulkan was developed, there were a bunch of OpenGL drivers out there which had random AST parsing bugs (Dolphin even has a bunch of workarounds for them); So a large chunk of the motivation for SPIR-V was avoiding the need for every driver to implement their own GLSL parser and the associated bugs.

      The problem for Dolphin is not the complexity of the shader, but the quantity.

      Shaders in modern games are usually written manually (or authored in a shader node editor by an artist), so it's rare for a game to have more than a few thousand total. Better games might only have a few dozen for the entire game.

      But because gamecube/Wii games configure the TEV pixel pipeline though a dynamic API, some games use that API in a pattern where Dolphin can find itself generating hundreds of shaders per second. Some games even manage to generate new shaders continually as you play, because they append junk state to their pixel pixeline state which dolphin doesn't detect as a duplicate.

      • Jasper_ 2 years ago

        Shaving off the frontend costs is not going to be nothing. I don't know if Dolphin is still using FXC/D3DCompile or if they've switched to DXC, but FXC is infamously slow, even for very simple shaders. Dolphin's shaders are medium-complexity IIRC, so I'd expect removing the frontend to be a decent win.

        The driver PSO compilers aren't amazing but they're also not terrible. Most games do some form of hash-n-cache for PSO compilation and while stutters are still an issue, it's not the worst in the world. With the frontend gone, I'd expect ~50 shaders per second to be roughly stutter-free.

        Being smarter about specialization is probably a good idea -- having a blend between "GPU interpreter" and "full specialized pipeline" is where I think you should head. Several of the weirder TEV features could probably be moved to branching on dynamic buffer contents.

        Not to mention using newer features like bindless to merge draw calls. I always wanted to do that but got too busy before I stopped working on Dolphin :)

        • phire 2 years ago

          > so I'd expect removing the frontend to be a decent win

          I did some testing before working on ubershaders, and my modified build which cached the bytecode output of FXC/D3DCompile (whatever dolphin was using at the time) didn't reduce the stuttering by enough to be worth the effort of optimising the frontend.

          My conclusion was that it's simply wasn't worth any effort to optimise for slightly smaller stutters, as they were still very perceivable to users. And Hybrid Ubershaders can hide any compile delays without any issue.

          And this testing was with FXC/D3DCompile which does a bunch of optimisations. The fact that SPIR-V comes in (potentially) unoptimised means any vulkan compiler has to send it though all optimisation passes. Though I have been very tempted to do dead code removal before submitting the shaders, partly to make the shaders more readable to humans and partly to reduce the amount of code going though the various compiler passes.

          > Being smarter about specialization is probably a good idea -- having a blend between "GPU interpreter" and "full specialized pipeline" is where I think you should head.

          Yeah, that was always next on the list. Start with just ubershaders and then incrementally specialise on a background thread for the correct balance of shaders.

          Dolphin's current specialised shaders are no-where near fully specialized. Need to go further by baking some of the constants and lookup textures into the shader.

  • oppositelock 2 years ago

    Very cool work!

    I had to solve a similar problem years ago, during the transition from fixed function to shaders, when shaders weren't as fast or powerful as today. We started out with an ubershader approximating the DX9/OpenGL 1.2 fixed functions, but that was too slow.

    People in those days thought of rendering state being stored in a tree, like the transform hierarchy, and you ended up having unpredictable state at the leaf nodes, sometimes leading to a very high permutation of possible states. At the time, I decomposed all possible pipeline state into atomic pieces, eg, one light, fog function, texenv, etc. These were all annotated with inputs and outputs, and based on the state graph traversal, we'd generate a minimal shader for each particular material automatically, while giving old tools the semblance of being able to compose fixed function states. As for you, doing this on-demand resulted in stuttering, but a single game only has so many possible states - from what I've seen, it's on the order of a few hundred to a few thousand. Once all shaders are generated, you can cache the generated shaders and compile them all at startup time.

    I wonder if something like this would work for emulating a Gamecube. You can definitely compute a signature for a game executable, and as you encounter new shaders, you can associate them with the game. Over time, you'll discover all the possible state, and if it's cached, you can compile all the cached shaders at startup.

    Anyhow, fun stuff. I used to love work like this. I've implemented 3DFx's Glide API on top of DX ages ago to play Voodoo games on my Nvidia cards, and contributed some code to an N64 emulator named UltraHLE.

    • M4v3R 2 years ago

      > contributed some code to an N64 emulator named UltraHLE

      That's a blast from the past, I distinctly remember reading up about UltraHLE way back when and then trying it our and for the first time being able to play Ocarina of Time on my middle class PC with almost no issues, that was magical.

  • naikrovek 2 years ago

    I still don’t understand why you didn’t use the precompiled shaders packed with the games… you’re emulating the GameCube or Wii GPU, and it’s never going to change, and the games provide precompiled shaders.

    • phire 2 years ago

      First, GameCube/Wii API actually generates the "shaders" at runtime, so there is simply no way to know which vertex/pixel pipeline states the game needs short of playing though the whole game, looking at every single bit of level geometry.

      Many games actually dynamically generate new "shaders" on the fly, based on which lights are near an object, and in which order.

      Second we can't use those vertex/pixel pipeline states directly on modern GPU, they need to be translated into modern shaders, and then compiled by the driver for your graphics card. It's actually that compile step which causes the stuttering, dolphin's translation is plenty fast enough.

      The combination of these two facts means Dolphin can't depend on any pre-computation at all.

      • naikrovek 2 years ago

        I don’t get it (this is not your fault, it’s mine) but I believe you.

        • rkachowski 2 years ago

          if I understand correctly

          1. "shader" is just a metaphor, the actual code running on the gamecube gpu is a custom pipeline that has a dynamic structure and is updated aggressively throughout the lifetime of the app - there is no static "shader" program to run on the host GPU.

          2. The architectures of the gamecube and modern GPUs are so distinct as to require an intricate translation layer in order to map gamecube rendering operations to first class shader operations on a modern GPU. This very process causes the stuttering that starts the issue.

          • bonzini 2 years ago

            Translation is not intricate, but modern graphics card are not tuned for dynamically setting up shaders.

    • TomatoCo 2 years ago

      That's the trick, they actually don't provide precompiled shaders as you know them. The graphics hardware back then was fixed function pipelines with a tremendous number of options to configure how they work. The downside is that you can't run truly arbitrary code but the upside is that they can instantaneously switch behavior as fast as setting a register.

      Prior to ubershaders the emulator took a configuration for the hardware pipeline and turned that into a shader, which took time to compile. Ubershaders work by emulating the entire fixed function pipeline in one glorious shader until the smaller, more efficient shader can be compiled and slipped in.

      Basically, the ubershader is the only thing that can actually understand the "shaders" packaged with the game and start using them with zero latency.

      Why not just precompile all the possible hardware combinations? There's far more combinations than atoms in the universe. Why not just precompile all the hardware combinations that the game actually uses? There's no way to tell before hand without examining every branch of the game's code which ranges in difficulty from "computationally prohibitive" to "fundamental theorems of how computers work says this is impossible".

      The article mentions that some users actually passed around cached shader packs, but that solution was brittle.

    • phkahler 2 years ago

      Wait, I hought that's what the ubershaders are. What you say is what I kept thinking for much of the article - "just" emulate the GPU, no compiler needed. And then they did.

      • naikrovek 2 years ago

        Maybe, lol, there were a lot of terms in that article that I didn’t understand well enough to claim I understood the article in toto.

        • pests 2 years ago

          One thing to remember is these older consoles don't have the same concept of a "shader" as we do today.

          Go back far enough and you'll find the industry trying to settle on quads or triangles for rendering (and we all knew who won)

          The games were given basically an immediate mode API into the graphics card and they could do whatever they wanted, whenever they wanted, without warning.

          The stutter happened when they were translating the API mentioned above into modern GPU shaders.

          When it was on the CPU - They had to determine the effect, generate and compile the modern shaders, and upload that to the GPU, sometimes hundreds of times a second. Then the GPU would take over and display.

          Uber shaders took that entire pipeline and moved it into the GPU.

          This was low level emulation, just still hitting limits of modern CPUs.

    • ErneX 2 years ago

      PC games that have a shader pre compile step usually have to re do it when new drivers come out, pre compiled shaders can be shipped to closed systems such as consoles or even steam deck but not for PC. Each different GPU brand requires different ones and like I said even when you update drivers.

    • mappu 2 years ago

      They're precompiled for the console GPU architecture, not the PC architecture, so they can't be used directly and still need to be emulated - I think those precompiled shaders are the input to the ubershader.

      • naikrovek 2 years ago

        The GAMES THEMSELVES are precompiled for the PowerPC architecture, not the PC architecture, though. That didn’t stop anyone from creating Dolphin.

        GPUs (I’m told) have far fewer instructions to emulate than a CPU, so I’d think that low level emulation of the Flipper shaders would be no trouble. Can’t translate or transpile them to PC GPUs though because those instruction sets are somewhat secret, I think.

        I know nothing about this stuff but I am a developer so perhaps I know enough to ask the most stupid questions possible.

        It’s gotta be a performance thing, why they didn’t emulate Flipper at a low enough level to use the precompiled shaders directly.

        • mappu 2 years ago

          > because those instruction sets are somewhat secret, I think

          The GPU ISAs are known (e.g. the PTX compiler for NVidia is open source and has a backend in LLVM). The main problem is that the GPU ISA changes with every GPU hardware generation and manufacturer, so if you want to support Nvidia 3xxx + 4xxx + AMD VLIW + AMD GCN + ... you have to use the common demoninator GLSL/HLSL/SPIR-V/whatever.

          > why they didn’t emulate Flipper at a low enough level to use the precompiled shaders directly.

          They did. Originally the GPU emulator was done in the CPU, and in 2017, the GPU emulator itself was moved into a shader ("ubershader").

          The console game itself does not include shaders in text format like many PC games do.

          • mandarax8 2 years ago

            > The GPU ISAs are known (e.g. the PTX compiler for NVidia is open source and has a backend in LLVM)

            PTX is only and IR afaik, kinda like SPIRV. It also goes through another compiler in the driver so doesn't really help here

        • TomatoCo 2 years ago

          The ubershader is the thing that emulates Flipper at a low enough level to use the precompiled "shaders" directly. Prior to that the precompiled "shaders" were examined and recompiled into individual shaders, a process that took time.

          (Why "shaders" in quotes? Because they weren't shaders as we know them today but really more like lists of hardware flags for how to flow data through a fixed function pipeline)

        • saagarjha 2 years ago

          Yes, that’s exactly the point though. This is the same question as why you can’t emulate a game by precompiling its code, and this doesn’t work because that information isn’t available until you try to run the game. That’s why Dolphin has an interpreter/JIT.

          • HideousKojima 2 years ago

            >This is the same question as why you can’t emulate a game by precompiling its code, and this doesn’t work because that information isn’t available until you try to run the game.

            I mean technically you can, but it generally requires a bunch of inefficient jump tables, or alternatively a way to fall back to an interpreter or JIT for self modifying code.

sfink 2 years ago

It's interesting to see the parallels between this and an engine for a dynamic programming language. The one I'm most familiar with is JavaScript.

When you first need to run something, you run it on the interpreter (JS) / ubershader (Dolphin). But once you know it's going to be run repeatedly (rarely for JS, almost always for Dolphin), you kick off an async compilation to produce JIT code (JS) / a specialized shader (Dolphin). You continue running in the expensive mode (interpreter / ubershader) until the compilation is complete, then you switch over seamlessly.

  • shepherdjerred 2 years ago

    While JS can be interpreted, V8 and SpiderMonkey (the two most common JS engines) will _always_ compile before execution -- JS is _never_ directly interpreted these days (aside from more niche engines).

    https://v8.dev/blog/ignition-interpreter

    https://firefox-source-docs.mozilla.org/js/index.html#javasc...

    • sfink 2 years ago

      That's not exactly right. I believe you're mixing up bytecode generation with compilation. The source code will indeed be compiled down to bytecode, but that doesn't count since it does not change the generality. As in, bytecode is no more specialized than the original source code. It's the same with shaders — the ubershader does not interpret the original shader source text.

      Both V8 and SM will interpret the bytecode until it warms up enough to be compiled to specialized machine code. ("Warms up" == "is observed to execute enough times".) There are some subtle distinctions about whether the interpreter is implemented in C++ or generated by a variant of the JIT code compiler, but as with the shaders the main point is whether it's executed in a way that works for everything or is specialized to a particular purpose (and varying degrees of specialization are implemented, with various mechanisms for falling back to a more general execution mechanism if the specialization assumptions no longer hold).

      Your SpiderMonkey doc link points to a section named "JavaScript Interpreter". The title is correct, that section is indeed about the mechanisms for interpreting JavaScript [bytecode].

      The V8 link is a little tricky, since it leads off with "Code is initially compiled by a baseline compiler", but if you read a little further, it says "...the V8 team has built a new JavaScript interpreter, called Ignition, which can replace V8’s baseline compiler". Basically, V8 experimented for a while with dropping the interpreter, but for the reasons described well in that document, they went back to initially running in an interpreter. The article is quite nice and describes quite a bit about the tradeoffs involved. It's 8 years old, but I believe the overall picture isn't that different today.

      (Source: I am an engineer on the SpiderMonkey team.)

      • shepherdjerred 2 years ago

        I don't think your reply conflicts with my comment. I was clarifying that modern JS engines do not directly interpret JS. JS is always compiled to bytecode which is then interpreted or compiled to machine code.

  • mst 2 years ago

    I might have said "oh, _clever_" out loud when that first clicked as I read through the piece.

GaggiX 2 years ago

The shader compilation stutter reminds me of a video I recently saw where a developer solved the problem by running a large portion of his game during its first loading: https://youtu.be/oG-H-IfXUqI

The developer register himself playing the game and during the first loading of the game, the entire gameplay is replayed at high speed in the background on the machine.

corysama 2 years ago

The pixel shading of the GameCube were slower than that of the OG Xbox. But, it was quite a bit more flexible. Specifically, the GameCube could load a couple textures, do a bit of math, then use that math to load some more texels. The Xbox could only load textures as the starting instructions before doing math and tried to make up for that with a few "do very specific math and load textures in a single instruction" ops.

But, still... Both GPUs were pretty well suited for this ubershader approach because they had a small, fixed limit on the number of instructions they could run. And, very strictly defined functionality for each instruction. They weren't really "shaders" as much as highly flexible fixed function stages that you could reasonably wedge in a text shader compiler as a front end and only get a moderate to high amount of complaints about how strict and limited the rules were for the assembly. I recall that both shading units could reasonably be fully specified as C structs that you manually packed into the GPU registers instead of using a shader compiler at all.

  • phire 2 years ago

    > The Xbox could only load textures as the starting instructions before doing math and tried to make up for that with a few "do very specific math and load textures in a single instruction" ops.

    If you look closely, the TEV actually shares the same limitation, it's just that the traditional representation interleaves the texture fetch and math instructions (Because the 3rd texture fetch "instruction" always feeds into the 3rd math "instruction", for example). There are two independent execution units, separated by a fifo and no way to backfeed from the math back to texture fetch.

    The two GPUs are roughly equivalent. The only reason the OG Xbox is consider to "have pixel shaders" is that they were exposed with a pixel shader API, while TEV was only ever exposed with a "texture environment" based API. They are both clearly register combiners, with no control flow, but they sit right in the middle as GPUs were transitioning from register combiners to "proper" pixel shaders. The team that designed GameCube's GPU went on to develop the first DirectX 9 GPU.

    I'm pretty sure the Xbox's pixel pipeline is slightly more capable as TEV doesn't have the Dot3 instruction (and it also has programmable vertex shaders). But developers all abandoned the xbox in 2005. TEV has a much better reputation for being flexible because TEV was used in the Wii all the way to ~2013. And graphics developers who were exposed to much better shaders on the Xbox, PS3 and PC got very good at back porting those modern techniques to the more limited Wii. More than one studio created un-offical shader compilers for the Wii, so they could share the same shaders across PS3/Xbox/Wii/PC.

    > I recall that both shading units could reasonably be fully specified as C structs that you manually packed into the GPU registers instead of using a shader compiler at all.

    Yeah, not that they ever exposed that API.

    The GameCube had great support for recording display lists, so you could record a display list while you called the API commands to configure TEV and then call that display list later to quickly load the "shader". Some games even saved those display lists to disc (or maybe generated them from scratch with external tools) as a form of offline shader compilation.

    • tom_ 2 years ago

      You could definitely define pixel shaders on the Xbox using a combiner type struct. See the D3DPIXELSHADERDEF struct in the docs, from memory the equivalent of one of the NV register combiner extensions from OpenGL with additional access to a secret extra stage ordinarily reserved for some fixed function pipeeline stuff.

  • bitwize 2 years ago

    ISTR the GC pipeline being fixed-function while the Xbox had a full-fat GPU (GeForce 3 variant) -- one of the reasons why the Xbox absolutely smoked the other sixth-gen consoles in terms of performance. Was I wrong?

    • phire 2 years ago

      Extremely common misconception (even among developers on those platforms)

      In reality, the OG Xbox and GameCube GPUs are almost identical in pixel shading capabilities (Though the gamecube's vertex shading pipeline is legitimately fixed function, but very flexible).

      Despite their roughly equal capabilities, they were exposed with very different APIs. The xbox used the new-fangled "Shader" style API that Microsoft was introducing to the industry at the time, while TEV used a very extended version of the older "Texture Environment" style API that was introduced with DirectX 7 and OpenGL 1.3.

      ----------------

      Edit: Actually, it might be better to explain from the other end:

      In a true fixed function GPU like the Playstation 2 and Dreamcast (or OG Playstation... but not the N64, which is a two stage register combiner) the pixel pipeline is limited to just one basic equation. A single texture is sampled, and that sample is multiplied with a single color interpolated from the vertex colors (which were usually derived from lights). The flexibility is of the equation was limited to replacing each input with a fixed value, and then enabling a few optional post-processing stages like depth based fog, alpha cutout and/or blending with a few fixed blend equations.

      But the results from that single texel * vertex_color equation are limiting. A common technique to produce better results on such GPUs was "multi-texturing". Graphics developers of the era would render the same triangles two or more times, but with different textures and vertex colours, blending the result into the frame buffer. This was commonly used to achieve the illusion of more detailed textures, or texture based light-maps. Or the reflections on cars in racing games.

      But blending in the frame buffer is expensive as it wastes a lot of memory bandwidth. The PS2 is hyper-optimised for this approach, it has the VUs which can quickly generate multiple draws of the same geometry, and a fast, embedded dram with enough read/write ports that it can do blending "for free". But in the PC world, GPUs started adding features to combine these multiple draw call together and blend the result before writing to the frame buffer. The Voodoo 2 and Nvidia TnT (Twin Texel) from 1998 are examples of GPUs that supported this multi-pass texturing.

      DirectX and OpenGL provided the "texture environment" APIs that automatically used these new single-pass multi-texturing features when available, or would fall back to multi-pass rendering on older GPUs.

      But the actual hardware was often more flexible than what DirectX/OpenGL exposed, though vendors supplied "Register Combiner" OpenGL extensions that exposed the full functionality (This is why John Carmack used OpenGL, so he could create optimised per-gpu render paths for each GPU). And these Register Combiners could be "programmed" to produce pixel equations that were way more advanced that what could be achieved with multi-pass rendering, as they could pass more than one value between stages. And they started supported 4 or 8 textures plus enough math stages to combine the textures.

      Microsoft gave up trying to expose the full capabilities of these register combiners though the older Texture Environment APIs and introduced Pixel Shaders with DirectX 8, but they were just providing a new API for the features GPUs already had. The register combine stages were simply renamed to "instructions".

      The Xbox is a register combiner with 4 texture fetch stages and 8 combiner stages and Gamecube has 16 combiner stages and 8 texture fetches (well, it technically supports 16 texture fetches, but there are only 8 sets of UV coords)

    • corysama 2 years ago

      The Xbox had fairly capable vertex shaders. But, phire's comment does a better job of explaining the pixel capabilities both machines than I did.

dang 2 years ago

Discussed at the time:

Ubershaders: A Ridiculous Solution to an Impossible Problem - https://news.ycombinator.com/item?id=14884992 - July 2017 (88 comments)

popcar2 2 years ago

This is a really neat article because the Godot engine is adding Ubershaders as well to fix shader compilation stuttering: https://github.com/godotengine/godot/pull/90400

  • darzu 2 years ago

    Pretty sure these are very different phenomena both called "ubershaders". As far as I understand it, outside of Dolphin, when people say "ubershader" they mostly mean a large material shader that has branches and constants to handle all the material variants used in one game (e.g. wood door, metal shield, shiny forcefield). It's produced by the engine so it's under full control of the engine developers. This means instead of loading separate shaders for the wood door and metal shield you can just tweak pipeline parameters. It's a work around for the various overheads, like compilation, involved in creating specialized shaders.

    But Dolphin's "Ubershader" is a different beast. It's about handling all the shader variants for _all_ Dolphin games (which are made in different engines) with one shader, and the variant parameters aren't passed as nice constants (data) but as shader programs (code) that need to be interpreted to be understood. It's more like a meta-shader that takes shaders as input and produces shaders as opposed to a "normal" ubershader which takes configuration to specialize it at runtime.

    I think that's right anyway. I haven't worked directly with ubershaders in either variety and my knowledge comes from building my own hobby engine and 3D pipelines.

    • 01HNNWZ0MV43FF 2 years ago

      It sounded to me like an interpreter that emulates the GameCube / Wii GPU inside a shader, but I didn't read it closely

doophus 2 years ago

What was the missing piece for "shader sharing"?

Would it be possible to build a web-hosted database of encountered shader configs against a game id, and have Dolphin fetch that list when a game launches and start doing async compilation?

When Dolphin encounters a new shader that wasn't in the db, it phones home to request it to be added it to the list.

I feel an automated sharing solution would build up coverage pretty quickly, and finding a stutter would eventually be considered an achievement - "no-one's been here before!"

  • mandarax8 2 years ago

    Every shader depends on both the driver version and the model of your GPU itself. Which means a lot of shaders. I think Valve had a version of it though but not without issues (GBs of shaders to be downloaded)

    • 91edec 2 years ago

      The Steamdeck does this thats why it doesn't suffer from stutters.

      For normal PC's, realistically Valve/Steam are the only people who could solve or implement this for PC games as they have the tech and platform to distribute it all. Even with all that its a crazy task to try and solve due to all of the variations and new patches for games that require the shaders to be recompiled again.

    • aprilnya 2 years ago

      The idea here wasn’t to share the compiled shader, but to share the shader configurations that each game uses — the ones that are used to then compile the actual shader. So you would compile them all at game start from the configurations you downloaded

    • jdiff 2 years ago

      To my knowledge, the massive size isn't from the drivers, but because transcoded video files are also slipped in with the shaders. Proton struggles with things like Media Foundation, so Valve transcodes videos on their end.

conorpo 2 years ago

Does anyone know why this isn't an issue for modern games on PC? I assume it's because more uniforms are used, and the amount of shaders that actually need to be compiled at runtime is minimized, not to mention that the Graphics API is optimized to compile the shaders in the format they are provided. So is the issue with Dolphin that GameCube games would compile new shaders for lots of different configurations of effects / stages? Would some sort of preprocessor that converts shader compilations to some mini-ubershader with uniforms that can handle a lot of the different effects be feasible? And then depending on how many completely different shaders there are you would have many different mini-ubershaders?

  • db48x 2 years ago

    The programmer of a modern PC game knows that they will be using shaders, and can arrange for them all to be compiled and sent to the GPU during a loading screen. That eliminates the lag, because there is no delay when choosing which shader to use for the next triangle. On the other hand it makes the loading screen take longer.

    Meanwhile the programmer of a console game, not using shaders, could set GPU registers to any configuration they wanted just before rendering the next triangle. You have to actually play the game to find out what configurations it programs into the GPU, because those configurations are not neatly organized into a set of discrete shaders. Even then there is no guarantee that you found all possible configurations used by the game. The videos in the article provide a good example: the player fires a gun with luminous bullets, so on that frame the walls and floors need to be rendered with an extra light source. That requires reconfiguring the GPU to take that light source into account, then changing the configuration to render the weapon itself, then changing it again to render the HUD, and so on.

    Now imagine that you go to a place on a different level where the walls are not shiny, and it doesn't bother to render the walls with the extra light source. Or it renders them with extra vertex lighting but not extra specular lighting. Now combine that with every type of wall and floor in the game; they might all need a unique shader to be lit correctly by that one gun. To find all possible GPU configurations you need to fire that gun, and every other, near every single different type of wall and floor texture used in the game. And there are a dozen different guns.

    And then you need to do it all again while wearing the night–vision goggles, because that causes everything to be rendered with a different configuration yet again.

    Every one of those unique combinations needs to be made into a shader, and there’s just no way to be sure that you have actually collected all of them. Or you can write a single Ubershader that can, by using branches, loops, and other advanced tricks, emulate the entire capabilities of the emulated GPU. Then you can program the Ubershader by sending all of the emulated GPUs register values as uniforms.

  • Jasper_ 2 years ago

    It is an issue with some modern games (I recently played a title that had a "Preparing Shaders..." loading screen); the main difference is that those games know the full set of what they need to do and can precompile most of them up-front, while an emulator like Dolphin needs to handle whatever the game throws it on the fly.

    Also, games might know what shaders it can skip and what it can't, but Dolphin can't skip shaders if they aren't compiled, because it doesn't know what the game will do with the render (e.g. Miis work by rendering their heads once into a texture, and then reusing that. If it skips the render because the shader isn't ready, the Mii will just be missing forever).

    Some emulators handle this by sharing "shader caches" between users so that they have a better idea of what the game will use; Dolphin opted for a different solution here.

    • MBCook 2 years ago

      It’s not a problem for the PS/Xbox/Switch. They have known hardware and it can all be recompiled.

      But from what I’ve heard it’s often still an issue on PCs (I’m a Mac guy). I’ve seen videos of shader compilation stutters, even in games with a precompilation step that’s supposed to avoid that.

      Digital Foundry has covered this many times. The link in a sibling comment to them on Eurogamer is a great place to start.

      • talldayo 2 years ago

        Steam works around this by letting users enable shader precompilation in settings. If you want a console-like experience (eg. like Steam Deck) and you don't care so much about storage space, you can toggle it on and eliminate the stutter before booting up. Most people leave this off, which really ruins the experience on shader-heavy engines like UE4.

        This is generally an everyone problem, though. If gaming on Mac was caught-up with where Linux is today, there would probably be a few precompilation steps there too. If you wanted to play Fallout 3 on your Android/iPhone device, it's the same story.

        • MBCook 2 years ago

          I’ve heard that helps a lot but I know new driver versions/etc can trigger a recompile. And the less common your setup the less likely it’s an issue.

          I don’t think it’s an issue on Apple Silicon Macs(not sure) because like the iPhone there is a very tiny list of variables so precompiling is easy.

  • ploxiln 2 years ago

    It is an issue for some modern big titles on PC. Trying to find some links that have somewhat general overview ...

    https://twistedvoxel.com/unreal-engine-5-pc-stuttering-issue...

    https://www.eurogamer.net/digitalfoundry-2022-df-direct-week...

  • ErneX 2 years ago

    This a big issue in modern PC games, games that don’t do a shader pre compile before starting the game suffer from serious shader compile stutters while playing the 1st time, dropping frames every time a new variation is required for a particular frame, it’s actually bad.

  • ruined 2 years ago

    gcn games never anticipated shader compilation time because it didn't exist.

    shaders weren't a thing when the gcn released. it may be arguable, but nobody even used the word at the time. shader compilation time is an issue for modern games on the PC, and because of this, developers anticipate and work around it.

    on the gcn, specialized fixed-function pipelines were available, and could be composed by some limited configuration (literally, 24 instructions). you may think of this as a sort of proto-shader, but significantly, the fixed-function pipelines embody quite a lot of behavior in specialized and limited hardware that is now typically achieved in software on more versatile hardware.

    so, to replicate that specialized hardware, modern graphics hardware (which exposes its greater capability as simple computational primitives) must compile a shader program and run it. but on gcn, the tiny configuration of static hardware loads near-instantly.

  • rtpg 2 years ago

    I think it's because PC games know they need to deal with compilation, so they do it on the load screen or whatever. GC games can pre-compile them and just stuff it on the disk, so there's no compilation cost.

    • pjmlp 2 years ago

      GC shaders are loosely based on GLSL, and OpenGL compilation model, even though GX(2) isn't OpenGL.

      • Jasper_ 2 years ago

        No, GameCube's TEVs have nothing to do with GLSL. The reference to GX2 implies you're confusing it with Wii U, inexplicably.

        • pjmlp 2 years ago

          Yes I mixed them with Wii and Wii U,no need for "inexplicably" remark.

nightowl_games 2 years ago

I've thought about writing a GPU side interpreter for SDF definitions for a while. I made a SDF shader generator that dumps out shaders with hard coded values, but doing it with bytecode would be cool. I'm sure this has been done before..

smallstepforman 2 years ago

I’m suprised to see that Ubershaders still exist, most game engines have settled on a set of fit-for-purpose custom shaders with almost no conditionals (for performance reasons), which is the opposite of UberShaders.

  • mort96 2 years ago

    Why would a lack of conditionals remove the need for uber shaders..? Or why would the presence of conditionals cause a need for uber shaders? How are these connected?

    It's common to have many shader variants exactly because you want to avoid 'if' statements; conditionals are turned into shader-compile-time options. Uber shaders are about turning those compile-time options back into runtime options to have a fallback shader to run while the right shader variant is being compiled.

  • petermcneeley 2 years ago

    The wins here are probably not as large as one might think https://advances.realtimerendering.com/s2016/s16_ramy_final....

    The fact is that a branch that the whole warp takes or does not take is relatively cheap on modern hw. Even if it is per thread dependent.

DrNosferatu 2 years ago

Why not just cache the shader compilation output and save it do disk? Only stutters or object pops on the 1st run.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection