Secrets of Direct3D 12: Do RTV and DSV descriptors make any sense?

Sun
12
Nov 2023

This article is intended for programmers who use Direct3D 12. We will explore the topic of descriptors, especially Render Target View (RTV) and Depth Stencil View (DSV) descriptors. To understand the article, you should already know what they are and how to use them. For learning the basics, I recommend my earlier article “Direct3D 12: Long Way to Access Data” where I described resource binding model in D3D12. Current article is somewhat a follow-up to that one. I also recommend checking the official “D3D12 Resource Binding Functional Spec”.

Descriptors in general

What is a “descriptor”? My personal definition would be that generally in computing, a descriptor is a small data structure that points to some larger data and describes its parameters. While a “pointer”, “identifier”, or “key” is typically just a single number that points or identifies the main object, a “descriptor” is typically a structure that also carries some parameters describing the object.

Descriptors in D3D12

Descriptors in D3D12 are also called “views”. They mean the same thing. Functions like ID3D12Device::CreateShaderResourceView or CreateRenderTargetView setup a descriptor. Note this is different from Vulkan, where a “view” and a “descriptor” are different entities. The concept of “view” is also present in relational databases. Just like in databases, a “view” points to the target data, but also specifies a way to look at them. In D3D12 it means, for example, that an SRV descriptor pointing to a texture can reinterpret its pixel format (e.g. with or without _SRGB), limit access to only selected range of mip levels or array slices.

Let’s talk about Constant Buffer View (CBV), Shader Resource View (SRV), or Unordered Access View (UAV) descriptors first. If created inside GPU-accessible descriptor heaps (class ID3D12DescriptorHeap, flag D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE), they can be bound to the graphics pipeline, as I described in details in my previously mentioned article. Being part of GPU memory has some implications:

#1: Once set up with one of the Create*View functions, descriptors must remain alive and unchanged as long as GPU may use them, which is until a command buffer containing commands that use this descriptor finishes execution on the GPU, which we check by waiting or polling on a fence.

// Sometime earlier:
pDevice->CreateShaderResourceView(pResource, pDesc, DescriptorCPUHandle);

// While rendering a frame:
pCommandList->SetGraphicsRootDescriptorTable(RootParamIndex, DescriptorGPUHandle);
// Recording commands that use this descriptor...
pCommandList->Close();
pQueue->ExecuteCommandLists(1, &pCommandList);
pQueue->Signal(pFence, FrameFenceValue);

// Sometime later, after N frames, make sure that command list finished execution.
pFence->SetEventOnCompletion(FrameFenceValue, hEvent);
WaitForSingleObject(hEvent, INFINITE);

// !!! Only now you can change the descriptor under DescriptorCPUHandle/DescriptorGPUHandle
// !!! or release its containing ID3D12DescriptorHeap!

#2: Descriptors and resources pointed by them are not tracked on the CPU side. If we have a texture, create a descriptor for it, then release the texture, bind the descriptor, and try to use in a shader, then bad things can happen. This is a use-after-free bug, which causes undefined behavior, likely ends up in a GPU crash (TDR). This kind of bugs is not detected by D3D Debug Layer.

#3: We need to use barriers to properly synchronize access to the resource on the GPU (ResourceBarrier function). For example, if we write to a buffer or a texture as a UAV in one compute shader, then write or read from it from the next shader, we need to issue a UAV barrier between them (D3D12_RESOURCE_BARRIER_TYPE_UAV) to avoid a data hazard.

Descriptors on the GPU

Descriptors didn’t exist in the old graphics APIs: OpenGL, Direct3D 9, 11. They were only added in the new generation of APIs: Vulkan, Direct3D 12. This is one of several new types of objects we need to handle in these explicit low-level APIs along with barriers, command buffers, queues, etc. Are they some abstract constructs that only add unnecessary complexity? No. On the GPU, a descriptor is a real thing. Old APIs just hide this complexity by handling them inside the graphics driver.

Let’s see how they look like. Let’s write a very simple pixel shader in HLSL that just samples a texture:

Texture2D myTexture;
SamplerState mySampler;
float4 PSMain(float2 texCoord : TEXCOORD0) : SV_Target0
{
    return myTexture.Sample(mySampler, texCoord);
}

Let’s now compile it for AMD / RDNA2, so that we can see the ISA (assembly). (See the code and compilation output in Compiler Explorer). The assembly is:

asic(GFX10_3)
type(PS)
sgpr_count(14)
vgpr_count(4)
wave_size(64)
s_version     UC_VERSION_GFX10 | UC_VERSION_W64_BIT
s_inst_prefetch  0x0003
s_mov_b32     m0, s3
s_mov_b64     s[12:13], exec
s_wqm_b64     exec, exec
s_getpc_b64   s[0:1]
v_interp_p1_f32  v2, v0, attr0.x  // !!!
v_interp_p1_f32  v0, v0, attr0.y  // !!!
s_mov_b32     s0, s2
v_interp_p2_f32  v2, v1, attr0.x
v_interp_p2_f32  v0, v1, attr0.y
s_load_dwordx8  s[4:11], s[0:1], null
s_load_dwordx4  s[0:3], s[0:1], 0x000020
s_and_b64     exec, exec, s[12:13]
s_waitcnt     lgkmcnt(0)
image_sample  v[0:3], [v2,v0], s[4:11], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_2D  // !!!
s_waitcnt     vmcnt(0)
v_cvt_pkrtz_f16_f32  v0, v0, v1
v_cvt_pkrtz_f16_f32  v2, v2, v3
s_mov_b64     exec, s[12:13]
exp           mrt0, v0, v0, v2, v2 done compr vm
s_endpgm

Don’t worry, you don’t need to know AMD GPU ISA to understand it. I only want to focus on selected instructions, which I marked with “!!!”. We wanted to sample one texture, but we have 3 memory operations here (not counting the final exp instruction that exports the result). image_sample instruction obviously does the texture sampling. s_load_dwordx8 and s_load_dwordx4 instructions placed earlier load the sampler and the texture descriptor, respectively.

AMD assembly is not only available for viewing in Compiler Explorer and other tools, but also has publicly available documentation. See “RDNA2” Instruction Set Architecture Reference Guide. By analyzing this long PDF, we can figure out more details. Instruction image_sample (documented in chapter 8.2.1. “Image instructions”, pages from 68) takes following arguments:

  1. First one is a range of 4 vector registers v[0:3] used as a destination for the sampled RGBA color.
  2. Then, we have first source argument: a set of 2 vector registers [v2,v0], which pass texture coordinates. As we can see, they come from v_interp_* instructions, as interpolated vertex attributes. (Vertex attribute interpolation is logically a fixed function operation executed before pixel shader, but on AMD it becomes part of the pixel shader code.)
  3. Next argument is a range of scalar registers s[4:11] that has been loaded using s_load_dwordx8 instruction. This is our sampler.
  4. Then, a range of other scalar registers s[0:3], loaded by the second load instruction, is the descriptor pointing to the texture.

Documentation of the actual descriptor structure can be found in chapter 8.2.6. “Image Resource”, pages from 75, where they are called “image resource T#”. This bit-packed data structure carriers 40-bit base address pointing to the texture data, as well as 16 bits for width, height, 4 bits for minimum/maximum mip level, among other parameters. As you can see, descriptors pointing to textures, buffers, and samplers really exist in the GPU hardware.

CPU references to resources

A different way of referring to resources in D3D12 is used when we don’t talk about shaders reading or writing them while executing draw calls and compute dispatches, but we talk about other kinds of GPU commands, like copies. For example, a simple function CopyResource takes ID3D12Resource pointer to the source and destination resource directly as its parameter, while a more complex CopyTextureRegion function takes pointers to a structure D3D12_TEXTURE_COPY_LOCATION that describes a texture or buffer region to be copied from/to and also contains a pointer to ID3D12Resource.

Note that these are also commands recorded to a command list. When executed on the GPU, they will also read or write some resources, just like shaders executing as part of draw calls and dispatches. In fact, clears and copies may end up as shader work on the GPU. Yet, they don’t use descriptors. A resource is specified explicitly when recording these commands. As such, it is known on the CPU timeline, which has some implications:

  1. They are tracked on the CPU side. If we have a texture, record a copy command from/to it to a command list, then release the texture before submitting that command list for execution, D3D Debug Layer will report it as an error.
  2. We don’t need to use barriers to properly synchronize access to the resource between these commands. For example, if we copy to a resource and then issue another copy to the same resource, we need not do any barrier between them to avoid write-after-write hazard. There isn’t even any way to express such barrier. Driver executes an implicit barrier before and after every such operation implicitly. I explored this topic in my old article “Secrets of Direct3D 12: Copies to the Same Buffer”.

RTV and DSV descriptors

So far we talked about descriptors that are used by shaders executing on the GPU, as an indirection in accessing buffers and textures, as CBV, SRV, or UAV. Samplers are a bit different, because they don’t point to any big block of data. They only carry a standalone set of parameters, e.g. AddressU = WRAP, Filter = MIN_MAG_MIP_LINEAR. In other aspects, however, they behave similarly to the descriptors mentioned above.

Finally, here we are in a strange space between two worlds. Let’s focus on Render Target View (RTV) and Depth Stencil View (DSV) descriptors. Every time we want to use a texture as a render target or depth-stencil, we need to use such descriptor. This includes binding them as RT+DS for rendering (function OMSetRenderTargets), as well as clearing them (functions ClearRenderTargetView, ClearDepthStencilView). Interestingly, this doesn’t apply to function DiscardResource, which is logically similar to a clear, and yet it takes a ID3D12Resource pointer directly.

RTV and DSV descriptors need to be created in dedicated types of descriptor heaps: D3D12_DESCRIPTOR_HEAP_TYPE_RTV, D3D12_DESCRIPTOR_HEAP_TYPE_DSV. They need to stay separate from the D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV. They always stay only on the CPU side. We cannot even create an RTV or DSV descriptor heap with flag D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE. Thus, they actually behave more like CPU-side reference to resources we discussed earlier with CopyResource or CopyTextureRegion functions than CBV/SRV/UAV descriptors used by shaders. As a proof, let’s consider three points, like we did earlier:

#1: Required lifetime of an RTV/DSV descriptor is only until the call to a function recording the command to the command buffer, e.g. ClearRenderTargetView, OMSetRenderTargets. This is specified in the “D3D12 Resource Binding Functional Spec”, chapter “Non Shader Visible Descriptor Heaps”:

As soon as a bind call, like SetRenderTargets() on the command list, returns back to the app, the source (non shader visible) descriptor heap location is free to be immediately changed by the application in preparation for the next call. In other words the driver doesn’t hold a reference to application provided memory.

// Sometime earlier:
pDevice->CreateRenderTargetView(pResource, pDesc, DescriptorCPUHandle);

// While rendering a frame:
pCommandList->ClearRenderTargetView(DescriptorCPUHandle, Color, 0, NULL);
pCommandList->OMSetRenderTargets(1, &DescriptorCPUHandle, TRUE, NULL);

// !!! Now you can already change the descriptor under DescriptorCPUHandle
// !!! or release its containing ID3D12DescriptorHeap!

// Recording commands that render into that render target...
pCommandList->Close();
pQueue->ExecuteCommandLists(1, &pCommandList);
pQueue->Signal(pFence, FrameFenceValue);

// Sometime later, after N frames, make sure that command list finished execution.
pFence->SetEventOnCompletion(FrameFenceValue, hEvent);
WaitForSingleObject(hEvent, INFINITE);

#2: They are tracked on the CPU side. If we have a RT or DS texture, record a clear command for it to a command list, then release the texture before submitting that command list for execution, D3D Debug Layer will report it as an error.

#3: We don’t need to use barriers to properly synchronize access to the resource between these commands. For example, if we clear a render target texture, then issue a sequence of draw calls that render to it as the render target, they are all guaranteed to execute in order without any write-after-write hazard.

Do they make sense?

Let me bring my main point: As you can see, RTV and DSV descriptors in Direct3D 12 don’t behave like normal descriptors. They don’t exist on the GPU. In my opinion, they make no sense. They are just an unnecessary complication. Functions like OMSetRenderTargets or ClearRenderTargetView should take a structure with a ID3D12Resource pointer directly, just like CopyTextureRegion does.

Can there be a reason why RTV and DSV descriptors exist and have to be used with clear operations and binding RT/DS textures, while other types of operations don’t need them? Maybe it gives Microsoft’s DirectX runtime or the graphics driver an opportunity to pre-bake these parameters into some specific binary format, just like GPU descriptors do? I stepped into the disassembly of CreateRenderTargetView function in Visual Studio debugger and I eventually found myself inside the .dll file of the D3D12 graphics driver (“nvwgf2umx.dll” For Nvidia, “amdxc64.dll” for AMD). Still, this doesn’t sound like a good justification to me. How expensive can it be to copy and bit-pack several pointers and numbers? We could be very well asked to pre-bake parameters of CopyTextureRegion function or even DrawIndexedInstanced, yet we can pass their parameters directly when recording commands to a command list.

My helper libraries

To further demonstrate my point, I’ve written two small C++ libraries and posted them to GitHub repository: D3D12DescriptorHelpers.

RenderTargetHelper provides a class that helps issuing OMSetRenderTargets, ClearRenderTargetView, ClearDepthStencilView commands easier. You can create just one object of the RenderTargetHelper class per entire application or, in case of multithreaded rendering, per render thread or per ID3D12CommandAllocator. It creates just 8 RTV and 1 DSV descriptor internally, because this is how many you may need at once when setting up multiple render targets. Then, you can clear and set render target and depth stencil textures by passing ID3D12Resource pointer and DESC structures directly. You can completely forget about RTV or DSV descriptors. Note that passing DESC structure is optional. When NULL, default parameters are assumed, which is usually sufficient, unless you want to target a specific mip level. In practice, using render targets simplifies to:

// When initializing the entire engine:
RenderTargetHelper RTHelper;
RTHelper.Init(pDevice);

// While rendering a frame:
RTHelper.ClearRenderTargetView(pCommandList, pRenderTargetTexture, NULL, Color, 0, NULL);
RTHelper.OMSetRenderTargets(pCommandList, 1, &pRenderTargetTexture, NULL, NULL, NULL);
// Recording commands that render into that render target...
pCommandList->Close();
pQueue->ExecuteCommandLists(1, &pCommandList);
pQueue->Signal(pFence, FrameFenceValue);

// Sometime later, after N frames, make sure that command list finished execution.
pFence->SetEventOnCompletion(FrameFenceValue, hEvent);
WaitForSingleObject(hEvent, INFINITE);

CopyTextureRegionUnhelper is a nonsensical library that provides classes and function with a custom concept of “copying descriptor”. It wraps CopyTextureRegion function into one that asks you to setup and use this new kind of descriptors. Of course, using it makes no sense. It just adds additional level of indirection and introduces unnecessary complication. I’ve written it just to demonstrate my point. Let’s thank Microsoft they didn’t ask us to do it this way! With it, a simple copy from a buffer to a texture would become:

// When initializing the entire engine:
D3D12_COPYING_DESCRIPTOR_HEAP_DESC CopyingDescHeapDesc = {};
CopyingDescHeapDesc.NumDescriptors = 1000;
ICopyingDescriptorHeap* pCopyingDescHeap;
CreateCopyingDescriptorHeap(&CopyingDescHeapDesc, &pCopyingDescHeap);

// When creating a buffer:
D3D12_CPU_DESCRIPTOR_HANDLE BufDescHandle = pCopyingDescHeap->GetCPUDescriptorHandleForHeapStart();
COPYING_FOOTPRINT_VIEW_DESC BufCFVDesc = {
    Offset, { Format, Width, Height, Depth, RowPitch } };
CreateCopyingFootprintView(pBuf, &BufCFVDesc, BufDescHandle);

// When creating a texture:
D3D12_CPU_DESCRIPTOR_HANDLE TexDescHandle = BufDescHandle;
TexDescHandle.ptr += GetCopyingDescriptorHandleIncrementSize();
COPYING_SUBRESOURCE_VIEW_DESC TexCSVDesc = {
    .SubresourceIndex = 0 };
CreateCopyingSubresourceView(pTex, &TexCSVDesc, TexDescHandle);

// While rendering a frame and need to make a copy:
CopyTextureRegion(pCommandList, TexDescHandle, 0, 0, 0, BufDescHandle, NULL);
// Recording other commands...
pCommandList->Close();
pQueue->ExecuteCommandLists(1, &pCommandList);
// Etc...

Summary

Congratulations on reding that far. If you understood this article, you must know DirectX 12 well. I hope you learned something or at least you had a good and though-provoking time reading this article. As you could see, in DirectX 12, CBV, SRV, UAV, and samplers really exist on the GPU, used by shaders that we execute as part of our draw calls and compute dispatches. Among other types of commands, some take ID3D12Resource pointer and additional parameters directly, e.g. CopyResource, CopyTextureRegion, DiscardResource. Only for functions OMSetRenderTargets, ClearRenderTargetView, ClearDepthStencilView we are asked to use special CPU-only RTV and DSV descriptors, which can be encapsulated by using a simple library like RenderTargetHelper.

Thanks to Jesse Natalie from Microsoft for pointing me to the right place in the D3D12 spec about RTV/DSV descriptors.

Comments | #directx #rendering Share

Comments

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2024