Tag: directx

Entries for tag "directx", ordered from most recent. Entry count: 62.

Pages: 1 2 3 ... 8 >

# Porting your engine to Vulkan or DX12 - video from my talk

12:23
Fri
06
Jul 2018

Organizers of Digital Dragons conference published video recording of my talk "Porting your engine to Vulkan or DX12":

PowerPoint slides are also available for download here: Porting your engine to Vulkan or DX12 - GPUOpen.

Comments | #vulkan #directx #teaching #events #video Share

# Memory management in Vulkan and DX12: slides are online

12:41
Tue
03
Apr 2018

Slides from my talk at Game Developers Conference (GDC) 2018: "Memory management in Vulkan and DX12" are now available online, as part of materials from Advanced Graphics Techniques Tutorial. Access to this PDF is open to anyone, not behind GDC Vault paywall. I've put some additional information in "backup" slides at the end that I didn't show during my presentation. The slides are designed the way that you can learn from them even without seeing the talk.

Update 2018-05-04: Slides from my talk in PPTX format with additional notes are now available (together with many other GDC 2018 presentations) on page: GDC 2018 Presentations - GPUOpen.

Comments | #events #teaching #gdc #directx #vulkan Share

# Switchable graphics versus D3D11 adapters

00:34
Sat
24
Feb 2018

When you have a laptop with so called "switchable graphics" (like I do in my Lenovo IdeaPad G50-80), you effectively have two GPUs. In my case, these are: Intel i7-5500U and AMD Radeon R5 M330. While programming in DirectX 11, you can enumerate these two adapters and choose any of them while creating a ID3D11Device object. For quite some time I was wondering how various settings of this "switchable graphics" affect my app? Today I finally figured it out. Long story short: They just change order of these adapters as visible to my program, so that the appropriate one is visible as adapter 0. Here is the full story:

It looks like the base setting is the one that can be found in Windows Settings > Power options > edit your power plan > Switchable Dynamic Graphics. (Not to confuse with "AMD Graphics Power Settings"!) When you set it to "Optimize power savings" or "Optimize performance", application sees Intel GPU as first adapter:

When you choose "Maximize performance", application sees AMD GPU as first adapter:

I also found that Radeon Settings (the app that comes with AMD graphics driver) overrides this system setting. If you go to System > Switchable Graphics and make configuration for your specific executable, then again: choosing "Power Saving" makes your app see Intel GPU as first adapter, while choosing "High Performance" makes AMD graphics first.

It's as simple as that. Basically if you always use the first adapter you find, then you follow recommended settings of the system. You are still free to use the other adapter while creating your D3D11 device. I checked that - it works and it really uses that one.

It's especially important if you meet a strange bug where your app hangs on one of these GPUs.

Update 2018-05-02: Microsoft plans to add an API for enumerating adapters based on a given GPU preference (minimum power or high performance). See IDXGIFactory6::EnumAdapterByGpuPreference.

Comments | #directx Share

# Memory management in Vulkan and DX12 - my talk at GDC 2018

11:26
Wed
21
Feb 2018

If you happen to come to this year's Game Developers Conference (GDC), I'd like to invite you to my talk: "Memory management in Vulkan and DX12". During this lecture I will not only advertise my Vulkan Memory Allocator library, but I will also show technical details, tips and tricks for GPU memory management that you can use on your own when programming using Vulkan or Direct3D 12.

List of presentations made by my colleagues at AMD can be found here: GDC 2018 Presentations - GPUOpen, and list of all GDC talks is available as Session Scheduler on gdconf.com.

Comments | #vulkan #directx #gdc #events #teaching Share

# When integrated graphics works better

10:36
Sat
17
Feb 2018

In RPG games the more powerful your character is, the more tough and scary are the monsters you have to fight. I sometimes get a feeling that the same applies to real life - bugs you meet when you are a programmer. I recently blogged about the issue when QueryPerformanceCounter call takes long time. I've just met another weird problem. Here is my story:

I have Lenovo IdeaPad G50-80 (80E502ENPB) laptop. It has switchable graphics: integrated Intel i7-5500U and dedicated AMD Radeon R5 M330. Of course I used to choose AMD dedicated graphics, because it's more powerful. My application is a music visualization program. It renders graphics using Direct3D 11. It uses one ID3D11Device object and one thread for rendering, but two windows displayed on two outputs: output 1 (laptop screen) contains window with GUI and preview, while output 2 (projector connected via VGA or HDMI) shows main view using borderless, topmost window covering whole screen (but not real fullscreen as in IDXGISwapChain::​SetFullscreenState). I tend to enable V-sync on output 1 (IDXGISwapChain::​Present SyncInterval = 1) and disable it on output 0 (SyncInterval = 0). My rendering algorithm looks like this:

Loop over frames:
    Render scene to MainRenderTarget
    Render MainRenderTarget to OutputBackBuffer, covering whole screen
    Render MainRenderTarget to PreviewBackBuffer, on a quad
    Render ImGui to PreviewBackBuffer
    OutputSwapChain->Present()
    PreviewSwapChain->Present()

So far I had just one problem with it: my framerate decreased over time. It used to drop very quickly after launching the app from 60 to 30 FPS and stabilize there, but after few hours it was steadily decreasing to 20 FPS or even less. I couldn't identify the reason for it in my code, like a memory leak. It seemed to be related to rendering. I could somehow live with this issue - low framerate was not that noticable.

Suddenly this Thursday, when I wanted to test new version of the program, I realized it hangs after around a minute from launching. It was a strange situation in which the app seemed to be running normally, but it was just not rendering any new frames. I could see it still works by inspecting CPU usage and thread list with Process Hacker. I could minimize its windows or cover them by other windows and they preserved their content after restoring. I even captured trace in GPUView, only to notice that the app is filling DirectX command queue and AMD GPU is working. Still, nothing was rendered.

That was a frightening situation for me, because I need to have it working for this weekend. After I checked that restarting app or the whole system doesn't help, I tried to identify the cause and fix it in various ways:

1. I thought that maybe there is just some bug in the new version of my program, so I launched the previous version - one that successfully worked before, reaching more than 10 hours of uptime. Unfortunately, the problem still occured.

2. I thought that maybe it's a bug in the new AMD graphics driver, so I downloaded and installed previous version, performing "Clean install". It didn't help either.

3. In desperation, I formatted whole hard drive and reinstalled operating system. I planned to to it anyway, because it was a 3-year-old system, upgraded from Windows 8 and I had some other problems with it (that I don't describe here because they were unrelated to graphics). I installed the latest, clean Windows 10 with latest updates and all the drivers. Even that didn't solve my problem. The program still hung soon after every launch.

I finally came up with an idea to switch my app to using Intel integrated graphics. It can be done in Radeon Settings > "Switchable Graphics" tab. In a popup menu for a specific executable, "High Performance" means choosing dedicated AMD GPU and "Power Saving" means choosing integrated Intel GPU. See article Configuring Laptop Switchable Graphics... for details.

It solved my problem! The program not only doesn't hang any longer, but it also maintains stable 60 FPS now (at least it did during my 2h test). Framerate drops only when there is a scene that blends many layers together on a FullHD output - apparently this GPU cannot keep up with drawing so many pixels per second. Anyway, this is the situation where using integrated Intel graphics turns out work better than a faster, dedicated GPU.

I still don't know what is the cause of this strange bug. Is it something in the way my app uses D3D11? Or is it a bug in graphics driver (one of the two I need to have installed)? I'd like to investigate it further when I find some time. For now, I tend to believe that:

- The only thing that might have changed recently and break my app was some Windows updated pushed by Microsoft.

- The two issues: the one that I had before with framerate decreasing over time and the new one with total image freeze are related. They may have something to do with switchable graphics - having two different GPUs in the system, both enabled at the same time. I suspect that maybe when I want to use Radeon, the outputs (or one of them) are connected to Intel anyway, so the image needs to be copied and synchronized with Intel driver.

Update 2018-02-21: Later after I published this post, I tried few other things to fix the problem. For example, I updated AMD graphics driver to latest version 18.2.2. It didn't help. Suddently, the problem disappeared as mysteriously as it appeared. It happened during a single system launch, without a restart. My application was hunging, and later it started working properly. The only thing that I can remember doing in between was downloading and launching UIforETW - a GUI tool for capturing Event Tracing for Windows (ETW) traces, like the ones for GPUView. I know that it automatically installs GPUView and other necessary tools on first launch, so that may have changed something in my system. Either way, now my program works on AMD graphics without a hang, reaching few hours of uptime and maintaining 60 FPS, which only sometimes drops to 30 FPS, but it also go back up.

Comments | #directx #gpu #windows Share

# Stencil Test Explained Using Code

20:11
Sun
20
Aug 2017

I must admit I never used stencil buffer in my personal code. I know it's there available in GPUs and all graphics APIs for years and it's useful for many things, but somehow I never had a need to use it. Recently I became aware that I don't fully understand it. There are many descriptions of stencil test on the Internet, but none of them definitely answered my questions in the way I would like them to be answered. I thought that a piece of pseudocode would explain it better than words, so here is my explanation of the stencil test.

Let's choose Direct3D 11 as our graphics API. Other APIs have similar sets of parameters. D3D11 offers following configuration parameters for stencil test:

Structure D3D11_DEPTH_STENCIL_DESC:

BOOL StencilEnable
UINT8 StencilReadMask
UINT8 StencilWriteMask
D3D11_DEPTH_STENCILOP_DESC FrontFace
D3D11_DEPTH_STENCILOP_DESC BackFace

Structure D3D11_DEPTH_STENCILOP_DESC:

D3D11_STENCIL_OP StencilFailOp
D3D11_STENCIL_OP StencilDepthFailOp
D3D11_STENCIL_OP StencilPassOp
D3D11_COMPARISON_FUNC StencilFunc

Parameter passed to ID3D11DeviceContext::OMSetDepthStencilState method:

UINT StencilRef

How do they work? If you render pixel (x, y) and you have current value of stencil buffer available as:

UINT8 Stencil[x, y]

Then pseudocode for stencil test and write could look like below. First, one of two sets of parameters is selected depending on whether current primitive is front-facing or back-facing:

if(StencilEnable)
{
    if(primitive has no front and back face, e.g. points, lines)
      StencilOpDesc = FrontFace
    else
    {
       if(primitive is front facing)
          StencilOpDesc = FrontFace
       else
          StencilOpDesc = BackFace
   }

Then, a test is performed:

    StencilTestPassed =
       (StencilRef & StencilReadMask) StencilOpDesc.StencilFunc
       (Stencil[x, y] & StencilReadMask)

StencilOpDesc.StencilFunc is a comparison operator that can be one of possible enum values: D3D11_COMPARISON_NEVER, LESS, EQUAL, LESS_EQUAL, GREATER, NOT_EQUAL, GREATER_EQUAL, ALWAYS. I think this is quite self-explanatory.

Notice how StencilRef is on the left side of comparison operator and current stencil buffer value is on the right.

Both StencilRef and current stencil buffer value are ANDed with StencilReadMask before comparison.

Next, based on the result of this test, as well as result of depth-test aka Z-test (which is out of scope of this article), an operation is selected:

    if(StencilTestPassed)
    {
       if(DepthTestPassed)
          Op = StencilOpDesc.SencilPassOp
       else
          Op = StencilOpDesc.StencilDepthFailOp
    }
    else
        Op = StencilOpDesc.StencilFailOp

Op is another enum that controls a new value to be written to stencil buffer. It can be one of:

    switch(Op)
    {
    case D3D11_STENCIL_OP_KEEP: NewValue = Stencil[x, y]
    case D3D11_STENCIL_OP_ZERO: NewValue = 0
    case D3D11_STENCIL_OP_REPLACE: NewValue = StencilRef
    case D3D11_STENCIL_OP_INCR_SAT: NewValue = min(Stencil[x, y] + 1, 0xFF)
    case D3D11_STENCIL_OP_DECR_SAT: NewValue = max(Stencil[x, y] - 1, 0)
    case D3D11_STENCIL_OP_INVERT: NewValue = ~Stencil[x, y]
    case D3D11_STENCIL_OP_INCR: NewValue = Stencil[x, y] + 1 // with 8-bit wrap-around
    case D3D11_STENCIL_OP_DECR: NewValue = Stencil[x, y] - 1 // with 8-bit wrap-around
    }

Finally, the new value is written to the stencil buffer. Notice how only those bits are changed that are included in StencilWriteMask. Others remain unchanged.

    Stencil[x, y] =
       (Stencil[x, y] & ~StencilWriteMask) |
       (NewValue & StencilWriteMask)
}

Now as we have all this explained in a very strict way using code, let me answer doubts I had before understanding this, in form of a FAQ.

Q: Are there no separate flags to enable stencil test and stencil write?

A: No. There is only one flag StencilEnable to enable all this functionality.

Q: So how to use only one and not the other?

A: You can find specific set of settings to do that.

To perform only stencil test and not write, set StencilEnable to true, StencilFunc to the comparison function you need and set all *Op to KEEP or alternatively set StencilWriteMask to 0 to disable any modifications to stencil buffer.

To perform only stencil write and not stencil test, set StencilEnable to true, all *Op and StencilWriteMask to values you need and set StencilFunc to ALWAYS to make the stencil test always passing.

Q: Is the StencilRef value also masked by StencilReadMask?

A: Yes. As you can see in the code, it is also ANDed with StencilReadMask, just as the previous value from stencil buffer. You don't need to provide it "pre-masked". (Comparison to "premultipled alpha" comes to my mind...)

In other words, we could say that only bits indicated by StencilReadMask from both sides participate in comparison.

Q: What are stencil value bits replaced to in REPLACE Op mode?

A: They are replaced with StencilRef value - the same that was used in comparison.

Q: Why is it the same StencilRef value as used for comparison, not separate one?

A: I don't know. There is separate StencilReadMask and StencilWriteMask. They could have provided separate "StencilReadRef" and "StencilWriteRef" - but for some reason the didn't :P

Q: What value is incremented/decremented when Op in INCR*, DECR*?

A: It's the original value read from stencil buffer, not masked or shifted in relation to StencilReadMask or StencilWriteMask. Which means it doesn't make much sense to use these ops if your StencilWriteMask looks like e.g. 0xF0 - masks out least significant bits.

Q: Is depth buffer updated when stencil test fails?

A: No. Failing stencil test means that the pixel is discarded, so Z-buffer is not updated and color is not written or blended to render targets.

On the other hand, failing Z-test can cause stencil buffer to be updated when you use StencilDepthFailOp other than KEEP.

If I misunderstood something and some of the information in this article is wrong, please let me know by e-mail or comment below.

Comments | #directx #graphics Share

# ID3D11Device::CreateTexture2D: pInitialData[0].SysMemPitch cannot be 0

17:07
Sat
09
Jan 2016

The concept of "stride" or "pitch" - a step (in bytes) to be taken to proceed to next element of a data structure - is brilliant, because it gives great flexibility. In 3D graphics for example, explicitly specified number of bytes between vertices in a vertex buffer or rows in a texture can be:

  1. Equal to exact size of vertex structure or texture row, when these entries are laid out next to each other.
  2. Greater, so additional padding can exist at the end of each entry or entries can be interleaved with some other data to be skipped in particular case.
  3. Zero, so that the same single entry will be read over and over.

This is a theory, because I just discovered that option 3 doesn't work in Direct3D 11 when passing pInitialData to created texture. I cannot see any reason why specifying D3D11_SUBRESOURCE_DATA::SysMemPitch == 0 should be considered invalid, other than trying to save developer from possibly unintended mistake. I think it is actually pretty useful for initializing a texture with the same data in each row, so it would be enough to allocate and fill the data for just one row, instead of full texture. And still, following code fails on call to CreateTexture2D:

CD3D11_TEXTURE2D_DESC textureDesc = CD3D11_TEXTURE2D_DESC(
    TEXTURE_FORMAT, // format
    (UINT)TEXTURE_SIZE.x, // width
    (UINT)TEXTURE_SIZE.y, // height
    1, // arraySize
    1, // mipLevels
    D3D11_BIND_SHADER_RESOURCE, // bindFlags
    D3D11_USAGE_DYNAMIC, // usage
    D3D11_CPU_ACCESS_WRITE); // cpuaccessFlags
std::vector<uint32_t> initialTextureRow(TEXTURE_SIZE.x);
ZeroMemory(&initialTextureRow[0], TEXTURE_SIZE.x * sizeof(uint32_t));
D3D11_SUBRESOURCE_DATA textureInitialData = {
    &initialTextureRow[0], // pSysMem
    0, // SysMemPitch
    0 }; // SysMemSlicePitch
ID3D11Texture2D *texture = nullptr;
ERR_GUARD_DIRECTX( m_Dev->CreateTexture2D(&textureDesc, &textureInitialData, &texture) );

DirectX debug layer reports error:

D3D11 ERROR: ID3D11Device::CreateTexture2D: pInitialData[0].SysMemPitch cannot be 0 [ STATE_CREATION ERROR #100: CREATETEXTURE2D_INVALIDINITIALDATA]

Dear Microsoft: Why? :)

Comments | #directx Share

# Direct3D 12 - Watch out for non-uniform resource index!

15:30
Mon
28
Sep 2015

DirectX 12 allows us to use arrays of resources (descriptor tables) and refer to them from a shader code by an index. The index can be constant or variable. But there is a big trap waiting for shader developers doing that! Microsoft just updated their public documentation in this regard (see Resource Binding in HLSL), so now I can talk about it.

In D3D12, resource index is expected to be uniform (convergent) by default. It means the index, even if dynamic, should be the same across whole draw call (all vertices/pixels/etc.). For example, it can be a result of some calculations, which depend on parameters coming from a constant buffer. Example code:

Texture2D<float4> textures[16] : register(t0);
SamplerState samp : register(s0);
struct SConstantBuffer
{
    uint MaterialId;
};
ConstantBuffer<SConstantBuffer> cb : register(b0);

float4 PS(
    in float2 texCoords : TEXCOORD0 ) : SV_Target
{
    uint materialId = cb.MaterialId;
    return textures[materialId].Sample(samp, texCoords);
}

Why is it this way? That is because on low level, GPU-s are made of SIMD processors. Each of its small processors executes shader instructions over multiple "SIMD threads" (or "warps", or "wavefronts", however you call them), not a single (scalar) value. Because of that, knowing that the resource index will always be the same on all SIMD threads can result in some optimizations and more efficient execution.

Resource indices in Direct3D 12 actually can be non-uniform (divergent) - different in every vertex/pixel/etc., like when they are result of some calculations based on vertex attributes or pixel position. But they must be then surrounded by a special, "pseudo"-function in HLSL, called NonUniformResourceIndex. For example:

Texture2D<float4> textures[16] : register(t0);
SamplerState samp : register(s0);

float4 PS(
    in float2 texCoords : TEXCOORD0,
    in uint materialId : MATERIAL_ID ) : SV_Target
{
    return textures[NonUniformResourceIndex(materialId)].Sample(samp, texCoords);
}

This may look a bit unintuitive, so I expect many developers will make the mistake and omit this function. If you use a non-uniform value as resource index, but forget to mark it with NonUniformResourceIndex, HLSL compiler probably won't warn you about it. It may even work in some cases and on some GPU-s, while it will give invalid (undefined) results on others. So similarly to the issue with reduced precision in normalize/length operations in GLSL, this is a thing to be careful with when coding your shaders in HLSL using new Shader Model 5.1.

Comments | #directx Share

Pages: 1 2 3 ... 8 >

STAT NO AD
[Stat] [STAT NO AD] [Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2018