Hello, I'm the author so I'll take some time to explain this proposal.
First, the shader situation. This API's shader solution is a script called shaderbuild.py, it's essentially a frontend for offline shader building tools on the client's machine, and it globs various formats together so they can individually be sent to the appropriate render backends. I'm aware that the original proposal included online shader compilation. This solution doesn't forbid this, because future SDLSL source could just be included in the binary and the CreateShaderModule function can translate it into the desired backend's bytecode on the fly. We wouldn't have to break the public API to allow this, which is a nice plus, and it prevents the shader compiler from being a blocker on using this API. When authoring a shader the API expects certain set layouts depending on the resource and shader stage: vertex samplers are set 0, fragment samplers are set 1, vertex uniforms are set 2 and fragment uniforms are set 3.
This is a modern-style rendering API, so almost all tasks occur in a deferred context and are broken up into render passes, compute passes, and copy passes. All operations that write to a resource have the ability to cycle to avoid inter-frame dependencies - handles to graphics resources like GpuBuffers are just containers so we can cycle references to internal resources. There are some data quirks due to AMD D3D11 drivers being not great - this is why there are a few different WriteOptions enums. I'd like to get rid of these but I haven't fully been able to work around the fact that D3D11 data APIs do not work as advertised on AMD.
Presentation is handled via SDL_GpuAcquireSwapchainTexture, which associates the given command buffer with a swapchain image. When SDL_GpuSubmit is called with this command buffer, the presentation structures are automatically configured and submitted. SDL_GpuSubmit automatically handles submission fences, but the client can choose to explicitly synchronize by calling SDL_GpuSubmitAndAcquireFence and using the returned SDL_GpuFence handle.
The rest of the API is fairly bog-standard binding, render, and compute dispatch calls.