Shapes and forms of DX12 root signatures

Tue
14
May 2024

This article is for you if you are a programmer using Direct3D 12. We will talk about a specific part of the API: root signatures. I will provide a comprehensive description of various formats in which they can be specified, stored, and ways to convert between them. The difficulty of this article is intermediate. You are expected to know at least some basics of D3D12. I think that advanced developers will can also learn something new, as some of the topics shown here are not what we typically use in a day-to-day development with D3D12.

Tools of the trade

I will use C++ as the programming language. Wherever possible, I will also try to use standalone command-line tools instead of writing a custom code. To repeat my experiments demonstrated in this article, you will need two of these:

You don't need to know the command-line syntax of these tools to understand the article. I will describe everything step-by-step.

Warning about DXC: If you also have Vulkan SDK installed, very likely your PATH environmental variable points to "dxc.exe" in that SDK instead of Windows SDK, which can cause problems. To check this, type command: where dxc. If you find Vulkan SDK listed first, make sure you call "dxc.exe" from Windows SDK, e.g. by explicitly specifying full path to the executable file.

Warning about RGA: If you want to repeat command-line experiments presented here, make sure to use Radeon GPU Analyzer in the latest version, at least 2.9.1. In older versions, the commands I present wouldn't work.

Shader compilation

A side note about shader compilation: Native CPU code, like the one we create when compiling our C++ programs, is saved in .exe files. I contains instructions in a common format called x86, which is sent directly to CPU for execution. It works regardless if you have an AMD or Intel processor in your computer, because they comply to the same standard. With programs written for the GPU (which we call shaders), things are different. Every GPU vendor (AMD, Nvidia, Intel) has its own instruction set, necessitating a two-step process for shader compilation:

  1. As graphics programmers, we write shaders in high-level languages like HLSL or GLSL. We then compile them using a shader compiler like "dxc.exe" to a binary format. It is actually an intermediate format common to all GPU vendors, defined by Microsoft for Direct3D (called DXIL) or by Khronos for Vulkan (called SPIR-V). We are encouraged to compile our shaders offline and only ship these compiled binaries to end users.
  2. When our application uses a graphics API (like Direct3D 12, Vulkan) and creates a pipeline state object (PSO), it specifies these shaders as inputs. This intermediate code then goes to the graphics driver, which performs second stage of the compilation - translates it to instructions valid for the specific GPU (also called Instruction Set Architecture - ISA). We typically don't see this assembly code and we never write it directly, although inspecting it can be useful for optimizations. Nvidia's ISA is secret, but AMD and Intel publish documents describing theirs. RGA tool mentioned below can show the AMD ISA.

What is a root signature?

In Direct3D 12, a root signature is a data structure that describes resource bindings used by a pipeline on all the shader stages. Let's see an example. Let's work with file "Shader1.hlsl": a very simple HLSL code that contains 2 entry points: function VsMain for vertex shader and function PsMain for pixel shader:

struct VsInput
{
 float3 pos : POSITION;
 float2 tex_coord : TEXCOORD;
};
struct VsOutput
{
 float4 pos : SV_Position;
 float2 tex_coord : TEXCOORD;
};

struct VsConstants
{
 float4x4 model_view_proj;
};
ConstantBuffer<VsConstants> vs_constant_buffer : register(b4);

VsOutput VsMain(VsInput i)
{
 VsOutput o;
 o.pos = mul(float4(i.pos, 1.0), vs_constant_buffer.model_view_proj);
 o.tex_coord = i.tex_coord;
 return o;
}

Texture2D<float4> color_texture : register(t0);
SamplerState color_sampler : register(s0);

float4 PsMain(VsOutput i) : SV_Target
{
 return color_texture.Sample(color_sampler, i.tex_coord);
}

I assume you already know that a shader is a program executed on a GPU that processes a single vertex or pixel with clearly defined inputs and outputs. To perform the work, it can also reach out to video memory to access additional resources, like buffers and textures. In the code shown above:

A root signature is a data structure that describes what I said above - what resources should be bound to the pipeline at individual shader stages. In this specific example, it will be a constant buffer at register b4, a texture at t0, and a sampler at s0. It can also be shown in form of a table:

Root param index Register Shader stage
0 b4 VS
1 t0 PS
2 s0 PS

I am simplifying things here, because this article is not about teaching you the basics of root signatures. For more information about them, you can check:

To prepare for our experiments, let's compile the shaders shown above using commands:

dxc -T vs_6_0 -E VsMain -Fo Shader1.vs.bin Shader1.hlsl
dxc -T ps_6_0 -E PsMain -Fo Shader1.ps.bin Shader1.hlsl

Note that a single HLSL source file can contain multiple functions (VsMain, PsMain). When we compile it, we need to specify one function as an entry point. For example, the first command compiles "Shader1.hlsl" file using VsMain function as the entry point (-E parameter) treated as a vertex shader in Shader Model 6.0 (-T parameter). Similarly, the second command compiles PsMain function as a pixel shader. Compiled shaders are saved in two separate files: "Shader1.vs.bin" and "Shader1.ps.bin".

#1. Data structure

It is time to show some C++ code. Imagine we have D3D12 already initialized, our compiled shaders loaded from files to memory, and now we want to render something on the screen. I said a root signature is a data structure, and indeed, we can create one by filling in some structures. The main one is D3D12_ROOT_SIGNATURE_DESC. Let's fill in the structures according to the table above.

// There will be 3 root parameters.
D3D12_ROOT_PARAMETER root_params[3] = {};

// Root param 0: CBV at b4, passed as descriptor table, visible to VS.
D3D12_DESCRIPTOR_RANGE vs_constant_buffer_desc_range = {};
vs_constant_buffer_desc_range.RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_CBV;
vs_constant_buffer_desc_range.NumDescriptors = 1;
vs_constant_buffer_desc_range.BaseShaderRegister = 4; // b4

root_params[0].ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
root_params[0].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX;
root_params[0].DescriptorTable.NumDescriptorRanges = 1;
root_params[0].DescriptorTable.pDescriptorRanges = &vs_constant_buffer_desc_range;

// Root param 1: SRV at t0, passed as descriptor table, visible to PS.
D3D12_DESCRIPTOR_RANGE color_texture_desc_range = {};
color_texture_desc_range.RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_SRV;
color_texture_desc_range.NumDescriptors = 1;
color_texture_desc_range.BaseShaderRegister = 0; // t0

root_params[1].ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
root_params[1].ShaderVisibility = D3D12_SHADER_VISIBILITY_PIXEL;
root_params[1].DescriptorTable.NumDescriptorRanges = 1;
root_params[1].DescriptorTable.pDescriptorRanges = &color_texture_desc_range;

// Root param 2: sampler at s0, passed as descriptor table, visible to PS.
D3D12_DESCRIPTOR_RANGE color_sampler_desc_range = {};
color_sampler_desc_range.RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_SAMPLER;
color_sampler_desc_range.NumDescriptors = 1;
color_sampler_desc_range.BaseShaderRegister = 0; // s0

root_params[2].ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
root_params[2].ShaderVisibility = D3D12_SHADER_VISIBILITY_PIXEL;
root_params[2].DescriptorTable.NumDescriptorRanges = 1;
root_params[2].DescriptorTable.pDescriptorRanges = &color_sampler_desc_range;

// The main structure describing the whole root signature.
D3D12_ROOT_SIGNATURE_DESC root_sig_desc = {};
root_sig_desc.NumParameters = 3;
root_sig_desc.pParameters = root_params;
root_sig_desc.Flags = D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT;

Variable root_sig_desc of type D3D12_ROOT_SIGNATURE_DESC is our data structure specifying the root signature. Let's call it a root signature representation number #1.

The code may look scary at first, but if you analyze it carefully, I am sure you can recognize the parameters of the 3 resources to bind that we talked about earlier. This code is so complex because a buffer or a texture can be bound in multiple ways, differing in the number of levels of indirection. Describing it is out of scope of this article, but I explained it comprehensively in my old article: Direct3D 12: Long Way to Access Data.

There is also an even more general structure D3D12_VERSIONED_ROOT_SIGNATURE_DESC that allows to use root signatures in versions higher than 1.0, but we won't talk about it in this article to not complicate things.

#2. Serialized root signature

If you also use Vulkan, you may recognize that the equivalent structure is VkDescriptorSetLayoutCreateInfo. From it, you can call function vkCreateDescriptorSetLayout to create an object of type VkDescriptorSetLayout, and then VkPipelineLayout, which is roughly equivalent to the DX12 root signature.

In DX12, however, this is not that simple. There is an intermediate step we need to go through. Microsoft requires converting this data structure to a special binary format first. They call it "serialization". We can do it using function D3D12SerializeRootSignature, like this:

ComPtr<ID3DBlob> root_sig_blob, error_blob;
HRESULT hr = D3D12SerializeRootSignature(&root_sig_desc, D3D_ROOT_SIGNATURE_VERSION_1_0,
    &root_sig_blob, &error_blob);
// Check hr...
const void* root_sig_data = root_sig_blob->GetBufferPointer();
size_t root_sig_data_size = root_sig_blob->GetBufferSize();

An object of type ID3DBlob is just a simple container that owns a memory buffer with binary data of some size. ("BLOB" stands for "Binary Large OBject".) This buffer we created here is our representation number #2 of the root signature.

If we save it to a file, we can see that our example root signature has 188 bytes. It starts from characters "DXBC", just like the shaders we previously complied with dxc tool, which indicates root signatures use the same container format as compiled shaders. I am not sure this binary format is documented somewhere. It should be possible to decipher anyway, as DirectX Shader Compiler (dxc) is open source. I never needed to work with this binary format directly, and we won't do it here either.

I guess Microsoft's intention was to encourage developers to prepare root signatures beforehand and store them in files, just like compiled shaders, so they are not assembled in runtime on every application launch. Is it worth it, though? Shader compilation is slow for sure, but would loading a file be faster than filling in the data structure and serializing it with D3D12SerializeRootSignature? I doubt it, unless Microsoft implemented this function extremely inefficiently. Very likely, this additional level of indirection is just an extra unnecessary complication that Microsoft prepared for us. That wouldn't be the only case they did it, as you can read in my old article Do RTV and DSV descriptors make any sense?

Note that if a serialized root signature is saved to a file and loaded later, it doesn't need to be stored in a ID3DBlob object. All we need is a pointer to the data and the size (number of bytes). The data can be stored in a byte array like char* arr = new char[size], or std::vector<char> (I like to use this one), or any other form.

#3. Root signature object

With this extra level of indirection done, we can use this serialized binary root signature to create an object of type ID3D12RootSignature. This is an opaque object that represents the root signature in memory, ready to be used by D3D12. Let's call it root signature representation number #3. The code for creating it is very simple:

ComPtr<ID3D12RootSignature> root_sig_obj;
hr = g_Device->CreateRootSignature(0, root_sig_data, root_sig_data_size,
    IID_PPV_ARGS(&root_sig_obj));
// Check hr...

#4. Pipeline state object

Having this root signature object, we can pass it as part of the D3D12_GRAPHICS_PIPELINE_STATE_DESC and use it to create a ID3D12PipelineState - a Pipeline State Object (PSO) that can be used for rendering.

D3D12_GRAPHICS_PIPELINE_STATE_DESC pso_desc = {};
pso_desc.pRootSignature = root_sig_obj.Get(); // Root signature!
pso_desc.VS.pShaderBytecode = vs_data; // Vertex shader from "Shader1.vs.bin".
pso_desc.VS.BytecodeLength = vs_data_size;
pso_desc.PS.pShaderBytecode = ps_data; // Pixel shader from "Shader1.ps.bin".
pso_desc.PS.BytecodeLength = ps_data_size;
pso_desc.RasterizerState.FillMode = D3D12_FILL_MODE_SOLID;
pso_desc.RasterizerState.CullMode = D3D12_CULL_MODE_NONE;
pso_desc.InputLayout.NumElements = _countof(input_elems);
pso_desc.InputLayout.pInputElementDescs = input_elems;
pso_desc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE;
pso_desc.NumRenderTargets = 1;
pso_desc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM_SRGB;
pso_desc.SampleDesc.Count = 1;

ComPtr<ID3D12PipelineState> pso;
hr = g_Device->CreateGraphicsPipelineState(&pso_desc, IID_PPV_ARGS(&pso));
// Check hr...

If we have the serialized root signature saved to a file "RootSigFromCode.bin", we can also play around with assembling a PSO without any coding, but using Radeon GPU Analyzer instead. Try the following command:

rga -s dx12 -c gfx1100 --all-hlsl Shader1.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --rs-bin RootSigFromCode.bin --offline --isa AMD_ISA

The meaning of individual parameters is:

When succeeded, this command creates 2 text files with the disassembly of the vertex and pixel shader: "gfx1100_AMD_ISA_vert.isa", "gfx1100_AMD_ISA_pixel.isa". The pixel shader looks like this:

; D3D12 Shader Hash 0x46f0bbb15b95e2453380ad3c9765222a
; API PSO Hash 0xd96cc024d8cb165d
; Driver Internal Pipeline Hash 0xf3a0f055053cc59f
; -------- Disassembly --------------------
shader main
asic(GFX11)
type(PS)
sgpr_count(14)
vgpr_count(8)
wave_size(64)
                                                        // s_ps_state in s0
s_version     UC_VERSION_GFX11 | UC_VERSION_W64_BIT   // 000000000000: B0802006
s_set_inst_prefetch_distance  0x0003                  // 000000000004: BF840003
s_mov_b32     m0, s4                                  // 000000000008: BEFD0004
s_mov_b64     s[12:13], exec                          // 00000000000C: BE8C017E
s_wqm_b64     exec, exec                              // 000000000010: BEFE1D7E
s_getpc_b64   s[0:1]                                  // 000000000014: BE804780
s_waitcnt_depctr  depctr_vm_vsrc(0) & depctr_va_vdst(0) // 000000000018: BF880F83
lds_param_load  v2, attr0.x wait_vdst:0               // 00000000001C: CE000002
lds_param_load  v3, attr0.y wait_vdst:0               // 000000000020: CE000103
s_mov_b32     s4, s2                                  // 000000000024: BE840002
s_mov_b32     s5, s1                                  // 000000000028: BE850001
s_mov_b32     s0, s3                                  // 00000000002C: BE800003
s_load_b256   s[4:11], s[4:5], null                   // 000000000030: F40C0102 F8000000
s_load_b128   s[0:3], s[0:1], null                    // 000000000038: F4080000 F8000000
v_interp_p10_f32  v4, v2, v0, v2 wait_exp:1           // 000000000040: CD000104 040A0102
v_interp_p10_f32  v0, v3, v0, v3 wait_exp:0           // 000000000048: CD000000 040E0103
                                                    s_delay_alu  instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2) // 000000000050: BF870112
v_interp_p2_f32  v2, v2, v1, v4 wait_exp:7            // 000000000054: CD010702 04120302
v_interp_p2_f32  v0, v3, v1, v0 wait_exp:7            // 00000000005C: CD010700 04020303
s_and_b64     exec, exec, s[12:13]                    // 000000000064: 8BFE0C7E
s_waitcnt     lgkmcnt(0)                              // 000000000068: BF89FC07
image_sample  v[0:3], [v2,v0], s[4:11], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_2D // 00000000006C: F06C0F05 00010002 00000000
s_waitcnt     vmcnt(0)                                // 000000000078: BF8903F7
v_cvt_pk_rtz_f16_f32  v0, v0, v1                      // 00000000007C: 5E000300
v_cvt_pk_rtz_f16_f32  v2, v2, v3                      // 000000000080: 5E040702
s_mov_b64     exec, s[12:13]                          // 000000000084: BEFE010C
exp           mrt0, v0, v2, off, off done             // 000000000088: F8000803 00000200
s_endpgm                                              // 000000000090: BFB00000

We will not analyze it here in details, but it is worth nothing that we have 3 memory loading instructions here, which correspond to the operations we do in the pixel shader: s_load_b256 and s_load_b128 load the descriptors of the sampler s0 and the texture t0, which are then both used by image_sample instruction to perform the texture sampling.

The diagram

We talked about many different formats of root signatures already, and there will be more. It is time to show a diagram that gathers them all and presents transitions between them. This is the central part of our article that we will refer to. Note that we already talked about representations number #1, #2, #3, #4, which you can find on the diagram.

Deserializing root signature

There is a way to convert a serialized root signature blob back to data structures. Microsoft offers function D3D12CreateRootSignatureDeserializer for this purpose. It creates an object of type ID3D12RootSignatureDeserializer, which owns structure D3D12_ROOT_SIGNATURE_DESC and other structures referred by it. Example code:

ComPtr<ID3D12RootSignatureDeserializer> root_sig_deserializer;
hr = D3D12CreateRootSignatureDeserializer(root_sig_data, root_sig_data_size,
    IID_PPV_ARGS(&root_sig_deserializer));
// Check hr...
const D3D12_ROOT_SIGNATURE_DESC* root_sig_desc = root_sig_deserializer->GetRootSignatureDesc();
// Inspect decoded root_sig_desc... 

When using higher root signature versions, you need to use function D3D12CreateVersionedRootSignatureDeserializer and interface ID3D12VersionedRootSignatureDeserializer instead.

#5. Text format

We are only in the middle of this article. This is because Microsoft prepared one more representation of the root signature - a text representation. For it, they defined a simple domain-specific language, which is fully documented on page Specifying Root Signatures in HLSL. As an example, our simple root signature presented in this article would look like this:

RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT),
DescriptorTable(CBV(b4), visibility=SHADER_VISIBILITY_VERTEX),
DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL),
DescriptorTable(Sampler(s0), visibility=SHADER_VISIBILITY_PIXEL)

I am sure you can recognize the same parameters we passed when we assembled a data structure describing this root signature in our C++ code. The text representation is clearly more concise and readable.

However, this is not exactly the way we specify root signatures in text format. It will go to our HLSL shader source file, but before we can put it there, we must pack it to a string defined using a #define macro, so it takes the form of:

#define MyRootSig "RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT), " \
    "DescriptorTable(CBV(b4), visibility=SHADER_VISIBILITY_VERTEX), " \
    "DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL), " \
    "DescriptorTable(Sampler(s0), visibility=SHADER_VISIBILITY_PIXEL)"

This is our root signature representation number #5 on the diagram. It looks somewhat clumsy, but this is the way we need to format it. The backslash symbol "\" at the end of each line except the last one is necessary to continue the #define macro in the next line. This is feature of the HLSL preprocessor, same as in C and C++ preprocessor.

We could simplify this macro by putting the whole string with our root signature in a single line, but I am not convinced it would make it more readable. Besides this, formatting root signatures like I shown above is the way recommended by Microsoft in their documentation.

If you think about converting a root signature back to the text representation, there is no ready function for that, but you can find such code in the RGA source, file "source/radeon_gpu_analyzer_backend/autogen/be_rootsignature_dx12.cpp", class RootSignatureUtil. I marked it as an arrow leading from #1 to #5 on the diagram, described as "Custom code".

#6. Attaching root signature to shaders

Having our root signature defined in the text format, packed into a #define macro, and included in our HLSL shader source file is a first step. Just like a single HLSL file can contain multiple entry points to various shaders, it also contain multiple root signature definitions, so we need to specify the one to use. To do this, we can attach a root signature to the function used as the shader entry point, using [RootSignature()] attribute with the name of our macro inside.

Here is the full contents of a new shader file "Shader2.hlsl" with root signature embedded:

struct VsInput
{
 float3 pos : POSITION;
float2 tex_coord : TEXCOORD;
};
struct VsOutput
{
float4 pos : SV_Position;
float2 tex_coord : TEXCOORD;
};

struct VsConstants
{
float4x4 model_view_proj;
};

#define MyRootSig "RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT), " \
"DescriptorTable(CBV(b4), visibility=SHADER_VISIBILITY_VERTEX), " \
"DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL), " \
"DescriptorTable(Sampler(s0), visibility=SHADER_VISIBILITY_PIXEL)"

ConstantBuffer<VsConstants> vs_constant_buffer : register(b4);

[RootSignature(MyRootSig)]
VsOutput VsMain(VsInput i)
{
VsOutput o;
o.pos = mul(float4(i.pos, 1.0), vs_constant_buffer.model_view_proj);
o.tex_coord = i.tex_coord;
return o;
}

Texture2D<float4> color_texture : register(t0);
SamplerState color_sampler : register(s0);

[RootSignature(MyRootSig)]
float4 PsMain(VsOutput i) : SV_Target
{
return color_texture.Sample(color_sampler, i.tex_coord);
}

If you compile VS and PS from this file using commands:

dxc -T vs_6_0 -E VsMain -Fo Shader2.vs.bin Shader2.hlsl
dxc -T ps_6_0 -E PsMain -Fo Shader2.ps.bin Shader2.hlsl

New files "Shader2.vs.bin" and "Shader2.ps.bin" will have size greater than respective "Shader1.vs.bin" and "Shader1.ps.bin" we created earlier by exactly 168 bytes, which is similar to the size of our serialized root signature. This indicates that our root signature is bundled together with the compiled shader code. This is the representation number #6 on the diagram.

Shaders compiled with a root signature embedded can then be used in the C++/D3D12 code for creating a PSO without a need to specify the root signature explicitly. Variable D3D12_GRAPHICS_PIPELINE_STATE_DESC::pRootSignature can be set to null. Our PSO creation code can now look like this:

D3D12_GRAPHICS_PIPELINE_STATE_DESC pso_desc = {};
pso_desc.pRootSignature = NULL; // Sic!
pso_desc.VS.pShaderBytecode = vs.data(); // Vertex shader from "Shader2.vs.bin".
pso_desc.VS.BytecodeLength = vs.size();
pso_desc.PS.pShaderBytecode = ps.data(); // Pixel shader from "Shader2.ps.bin".
pso_desc.PS.BytecodeLength = ps.size();
pso_desc.RasterizerState.FillMode = D3D12_FILL_MODE_SOLID;
pso_desc.RasterizerState.CullMode = D3D12_CULL_MODE_NONE;
pso_desc.InputLayout.NumElements = _countof(input_elems);
pso_desc.InputLayout.pInputElementDescs = input_elems;
pso_desc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE;
pso_desc.NumRenderTargets = 1;
pso_desc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM_SRGB;
pso_desc.SampleDesc.Count = 1;

ComPtr<ID3D12PipelineState> pso;
hr = g_Device->CreateGraphicsPipelineState(&pso_desc, IID_PPV_ARGS(&pso));
// Check hr...

Similarly, we can use RGA to compile those shaders, assemble the PSO, and output AMD GPU assembly:

rga -s dx12 -c gfx1100 --all-hlsl Shader2.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --offline --isa AMD_ISA

Because we can use multiple shaders at different shaders stages (vertex shader, pixel shader, possibly also hull, domain, geometry, amplification, mesh shader...) when creating a PSO, and we attached a [RootSignature()] attribute to all of them, you may ask what happens if some shader stages don't specify a root signature or specify a different one. Here are the rules:

D3D12 ERROR: ID3D12Device::CreateGraphicsPipelineState: Root Signature doesn't match Pixel Shader: Root signature of Vertex Shader doesn't match the root signature of Pixel Shader

Compiling standalone root signature from text

When we have a root signature encoded in the text format, we can use it in two ways. One is attaching it to a shader entry point function using the [RootSignature()] attribute, like we've seen in the previous section. The second one is compiling root signature alone. For this, we need to use dedicated command-line arguments for "dxc.exe" and specify the name of our macro.

Let's create a separate HLSL file with only the root signature, called "RootSig.hlsl":

#define MyRootSig "RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT), " \
    "DescriptorTable(CBV(b4), visibility=SHADER_VISIBILITY_VERTEX), " \
    "DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL), " \
    "DescriptorTable(Sampler(s0), visibility=SHADER_VISIBILITY_PIXEL)"

Let's now use the following command to compile it:

dxc -T rootsig_1_0 -E MyRootSig -Fo RootSigFromHlsl.bin RootSig.hlsl

The output of this command is file "RootSigFromHlsl.bin", which is 188 bytes - exactly the same size as the file "RootSigFromCode.bin" we created earlier by filling in data structures in C++ and serializing them. Thus, we can say we just learned the way to create serialized root signature binary from the text representation. We can now connect two existing blocks in our diagram with the arrow leading from #5 to #2.

Note you can use our previous file "Shader2.hlsl" instead of "RootSig.hlsl" with the same effect. That file contains shader functions, but they just get ignored, as we only use the MyRootSig macro.

Repacking root signatures

Because there are so many ways of storing root signatures, Microsoft provided a possibility to convert between them using dedicated command-line parameters of DXC:

We can specify a compiled shader with a root signature embedded and extract only the root signature blob from it (connection from #6 to #2 in our diagram):

dxc -dumpbin -extractrootsignature -Fo RootSigExtracted.bin Shader2.vs.bin

The -dumpbin parameter means that the input file (specified as the positional argument at the end) is a compiled binary, not a text file with HLSL source.

We can transform a compiled shader file into one with the embedded root signature removed. This path is not shown in the diagram. The output file "ShaderNoRootSig.vs.bin" has the same size (4547 B) as "Shader1.vs.bin" that we compiled previously without a root signature.

dxc -dumpbin -Qstrip_rootsignature -Fo ShaderNoRootSig.vs.bin Shader2.vs.bin

We can also join two binary files: one with compiled shader, one with root signature blob, and create a file with the shader and the root signature embedded in it. This is shown on the diagram as a path from #2 to #6.

dxc -dumpbin -setrootsignature RootSigFromCode.bin -Fo ShaderWithRootSigAdded.vs.bin Shader1.vs.bin

Warning about DXC parameters

I've shown all these commands here, because it is very important to get them right. Microsoft did a terrible job here offering many options in the command-line syntax that can be misleading. For example:

Moreover, if you do it the wrong way, DXC prints some cryptic, unrelated error message or prints nothing, does nothing, and exits with process exit code 0. Not very helpful!

Usage of RGA

Radeon GPU Analyzer utilizes DXC internally, so it can be used to compile shaders from HLSL source code all the way to the pipeline state object (both stages of the shader compilation). That PSO is created internally just to extract the final ISA assembly from it. Here is a command we've seen before:

rga -s dx12 -c gfx1100 --all-hlsl Shader2.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --offline --isa AMD_ISA

However, RGA supports many more command-line options. Input shaders can be specified in HLSL format using --all-hlsl FILE or per-stage --vs FILE, --ps FILE etc. with mandatory entry point function names passed as --vs-entry NAME, --ps-entry NAME, etc. Alternatively, we can specify compiled binary shaders as input. Then, the input is the intermediate shader representation, while RGA performs only the second stage of the shader compilation.

rga -s dx12 -c gfx1100 --vs-blob Shader2.vs.bin --ps-blob Shader2.ps.bin --offline --isa AMD_ISA

Similarly, a root signature can be specified in one of many ways:

1. Embedded in shaders, like in the 2 commands shown above, as our "Shader2" was compiled with the root signature.

2. From a separate HLSL file and specific #define macro:

rga -s dx12 -c gfx1100 --all-hlsl Shader1.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --rs-hlsl RootSig.hlsl --rs-macro MyRootSig --offline --isa AMD_ISA

3. From a binary file with the serialized root signature:

rga -s dx12 -c gfx1100 --all-hlsl Shader1.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --rs-bin RootSigFromCode.bin --offline --isa AMD_ISA

4. None at all. Then, a root signature matching the compiled shaders gets auto-generated. This is a new feature of RGA 2.9.1.

rga -s dx12 -c gfx1100 --all-hlsl Shader1.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --offline --isa AMD_ISA

Using DXC as a library

Up to this point, our discussion has centered around a C++ code specifically tailored for loading compiled shaders and establishing a PSO, typically for D3D12 rendering purposes, such as game development or other graphics applications. The compilation of shaders was exclusively carried out utilizing standalone command-line tools: DXC and RGA.

However, DXC shader compiler can also be used in form of a C++ library. Everything we can do with "dxc.exe" we can also do programmatically from our code by using equivalent library. To use the library:

  1. LoadLibrary "dxil.dll" and "dxcompiler.dll".
  2. GetProcAddess of only one function: DxcCreateInstance, as everything starts from it.

I won't describe the library in details here. It is out of scope of this article, and the article is already very long. However, I would like to point to some interesting features:

1. Certainly, we can compile a shader. To do it, use function IDxcCompiler3::Compile. Interestingly, we don't fill in data structures with specific parameters for the compiler, like we would normally expect from a programmatic API. Instead, we are asked to format a list of strings with parameters, same as we would pass to the command-line DXC, e.g.:

const wchar_t* arguments[] = {
    L"-T ps_6_0",
    L"-E PsMain",
    // Etc...
};

Because we talk about root signatures here, it is worth noting that we can check if the compiled shader has one embedded. Calling IDxcResult::GetOutput with parameter DXC_OUT_OBJECT returns the compiled shader blob, DXC_OUT_ERRORS returns a string with errors and warnings, while DXC_OUT_ROOR_SIGNATURE tells us that the shader had a root signature attached.

2. The DXC library offers an interesting feature called reflection. It allows inspecting an existing compiled shader binary for various parameters, including inputs, outputs, and resource bindings. Inputs and outputs are vertex attributes or (in case of a pixel shader output) render targets written. The list of resource bindings is the most interesting for us here, because it allows to generate a root signature compatible with the shader.

Certainly, there isn't just one possible root signature compatible with a given shader, so a generated one may not align with your requirements. For example, a constant buffer b4 can be bound to a shader in one of 3 ways: as a 32-bit root constant, as a root CBV, or a descriptor table containing a CBV. Similarly, multiple subsequent slots like (b2, b3, b4) can be defined in a root signature as separate root parameters or as a single parameter with a descriptor table carrying numDescriptors = 3. However, reflection can still be useful sometimes if you develop your own engine, and you want automate resource binding based on the shader code.

To use this feature, call IDxcUtils::CreateReflection, pass the shader binary, and retrieve a new object of type ID3D12ShaderReflection. You can then query it for parameters, like ID3D12ShaderReflection::GetResourceBindingDesc. You can see an example of shader reflection used to generate the root signature in RGA source code - see file "source/radeon_gpu_analyzer_backend/autogen/be_reflection_dx12.cpp" and other related places.

3. The DXC library also provides tools to manipulate the binary container format, enabling tasks such as extracting, adding, or removing a root signature from a shader. To use it, search the library header for interface IDxcContainerReflection or a simpler function IDxcUtils::GetDxilContainerPart, as well as interface IDxcContainerBuilder. For example, you can check if a shader binary contains a root signature embedded using following code:

void* part_data = NULL; uint32_t part_size = 0;
HRESULT hr = dxc_utils->GetDxilContainerPart(&shader_binary,
    DXC_PART_ROOT_SIGNATURE, &part_data, &part_size);
bool has_root_signature = SUCCEEDED(hr);

How many root signatures to use?

As for the policy regarding the usage of root signatures, do they need to match our shaders exactly? No, but the following rules apply:

You may ask: Can I just create one big all-encompassing root signature that defines all the resource bindings I may ever need and use it for all my shaders? Theoretically you could, but there are two main arguments against doing this.

  1. Root signatures cannot be arbitrary big. There is a limit on the number of root parameters, calculated in units of some virtual DWORDs. Whether these are real 32-bit DWORDs in the D3D12 implementation provided by the graphics driver, it is not important here. What matters is that every root constant costs 1 DWORD per 4 B of data, every root descriptor costs 2 DWORDs, and every descriptor table costs 1 DWORD, while the limit is 64 DWORDs in total. You can find it documented here: "Root Signature Limits". A sample code for calculating this cost can be found in the RGA source: file "source/radeon_gpu_analyzer_backend/autogen/be_rootsignature_dx12.cpp", function CalculateRootSignatureCost. This limit can be overcame, though, by defining descriptor tables with whole ranges of descriptors in them as a single root parameter, e.g. DescriptorTable(CBV(b0, numDescriptors=10)).
  2. "AMD RDNA Performance Guide" in section "Descriptors" recommends trying to minimize the size of root signatures/descriptor set layouts, to avoid spilling user data to memory. This would likely mean an extra level of indirection in the internal implementation causing some performance overhead.

On the other hand, switching root signature for every shader and rebinding all the root arguments can have its overhead too. If you look at Cyberpunk 2077, for example, you can see that they just use one big root signature for all graphics shaders and the second one for all compute shaders in the game. I am not disclosing any secret here. If you own the game on Steam or GOG, you can capture a frame using PIX on Windows and see it by yourself. If they could do it in a AAA game that looks and runs so well, do we really need to optimize better? 😀

Update 2024-05-15: In the comments below my post on Mastodon, others disclosed that Frostbite engine by DICE, as well as the engine developed by Digital Extremes take the same approach.

Summary

This article offers a comprehensive description of various formats of root signatures in Direct3D 12. We've explored some C++ code along with the utilization of command-line tools such as the DXC shader compiler from Microsoft and the Radeon GPU Analyzer (RGA) from AMD. A root signature can be authored or stored as:

We've learned how to use these representations and how to convert between them.

Comments | #directx #rendering Share

Comments

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2024