# Vulkan API - my talk at Warsaw University of Technology

Apr 2018

On Wednesday 16 April, around 8 PM, at Warsaw University of Technology, during weekly meeting of KNTG Polygon, I will give a talk about "Vulkan API" (in Polish). Come if you want to hear about new generation of graphics APIs, see how Vulkan API looks like, what tools are there to support it, what are advantages and disadvantages of using such API and finally decide whethere learning Vulkan is a good idea for you.

Event on Facebook:

Vulkan API.pdf
Vulkan API.pptx

Comments | #graphics #gpu #vulkan #teaching Share

# Memory management in Vulkan and DX12: slides are online

Apr 2018

Slides from my talk at Game Developers Conference (GDC) 2018: "Memory management in Vulkan and DX12" are now available online, as part of materials from Advanced Graphics Techniques Tutorial. Access to this PDF is open to anyone, not behind GDC Vault paywall. I've put some additional information in "backup" slides at the end that I didn't show during my presentation. The slides are designed the way that you can learn from them even without seeing the talk.

Comments | #vulkan #directx #gdc #teaching #events Share

# Debugging Vulkan driver crash - equivalent of NVIDIA Aftermath

Mar 2018

New generation, explcit graphics APIs (Vulkan and DirectX 12) are more efficient, involve less CPU overhead. Part of it is that they don't check most errors. In old APIs (Direct3D 9, OpenGL) every function call was validated internally, returned success of failure code, while driver crash indicated a bug in driver code. New APIs, on the other hand, rely on developer doing the right thing. Of course some functions still return error code (especially ones that allocate memory or create some resource), but those that record commands into a command buffer just return void. If you do something illegal, you can expect undefined behavior. You can use Validation Layers / Debug Layer to do some checks, but otherwise everything may work fine on some GPUs, you may get incorrect result, or you may experience driver crash or timeout (called "TDR"). Good thing is that (contrary to old Windows XP), crash inside graphics driver doesn't cause "blue screen of death" or machine restart. System just restarts graphics hardware and driver, while your program received VK_ERROR_DEVICE_LOST code from one of functions like vkQueueSubmit. Unfortunately, you then don't know which specific draw call or other command caused the crash.

NVIDIA proposed solution for that: they created NVIDIA Aftermath library. It lets you (among other things) record commands that write custom "marker" data to a buffer that survives driver crash, so you can later read it and see which command was successfully executed last. Unfortunately, this library works only with NVIDIA graphics cards and only in D3D11 and D3D12.

I was looking for similar solution for Vulkan. When I saw that Vulkan can "import" external memory, I thought that maybe I could use function vkCmdFillBuffer to write immediate value to such buffer and this way implement the same logic. I then started experimenting with extensions: VK_KHR_get_physical_device_properties_2, VK_KHR_external_memory_capabilities, VK_KHR_external_memory, VK_KHR_external_memory_win32, VK_KHR_dedicated_allocation. I was basically trying to somehow allocate a piece of system memory and import it to Vulkan to write to it as Vulkan buffer. I tried many things: CreateFileMapping + MapViewOfFile, HeapCreate + HeapAlloc and other ways, with various flags, but nothing worked for me. I also couldn't find any description or sample code of how these extensions could be used in Windows to import some system memory as Vulkan buffer.

Everything changed when I learned that creating normal device memory and buffer inside Vulkan is enough! It survives driver crash, so its content can be read later via mapped pointer. No extensions required. I don't think this is guaranteed by specification, but it seems to work on both AMD and NVIDIA cards. So my current solution to write makers that survive driver crash in Vulkan is:

  1. Call vkAllocateMemory to allocate VkDeviceMemory from memory type that has HOST_VISIBLE + HOST_COHERENT flags. (This is system RAM. Spec guarantees that you can always find such type.)
  2. Map the memory using vkMapMemory to get raw CPU pointer to its data.
  3. Call vkCreateBuffer to create VkBuffer with VK_BUFFER_USAGE_TRANSFER_DST_BIT and bind it to that memory using vkBindBufferMemory.
  4. While recording commands to VkCommandBuffer, use vkCmdFillBuffer to write immediate data with your custom "markers" to the buffer.
  5. If everything goes right, don't forget to vkDestroyBuffer and vkFreeMemory during shutdown.
  6. If you experience driver crash (receive VK_ERROR_DEVICE_LOST), read data under the pointer to see what marker values were successfully written last and deduce which one of your commands might cause the crash.

There is also a new extension available on latest AMD drivers: VK_AMD_buffer_marker. It adds just one function: vkCmdWriteBufferMarkerAMD. It works similar to beforementioned vkCmdFillBuffer, but it adds two good things that let you write your markers with much better granularity:

  • It can be called both inside and outside render pass, while vkCmdFillBuffer must be called outside render pass.
  • It performs its write after specified pipeline stage finished executing.

I created a simple library that implements all this logic under easy interface. All you need to use it is just this single file: VulkanAfterCrash.h.

Update 4 April 2018: In GDC 2018 talk "Aftermath: Advances in GPU Crash Debugging (Presented by NVIDIA)", Alex Dunn announced that a Vulkan extension from NVIDIA will also be available, called VK_NV_device_diagnostic_checkpoints, but I can see it's not publicly accessible yet.

Comments | #vulkan #graphics #libraries #productions Share

# Vulkan Memory Allocator 2.0.0

Mar 2018

At Game Developers Conference (GDC) last week I released final version 2.0.0 of Vulkan Memory Allocator library. It is now well documented and thanks to contributions from open source community it compiles and works on Windows, Linux, Android, and MacOS. Together with it I released VMA Dump Vis - a Python script that visualizes Vulkan memory on a picture. From now on I will continue incremental development on "development" branch and occasionally merge to "master". Feel free to contact me if you have any feedback, suggestions or if you find a bug.

Comments | #vulkan #libraries #productions #graphics Share

# Switchable graphics versus D3D11 adapters

Feb 2018

When you have a laptop with so called "switchable graphics" (like I do in my Lenovo IdeaPad G50-80), you effectively have two GPUs. In my case, these are: Intel i7-5500U and AMD Radeon R5 M330. While programming in DirectX 11, you can enumerate these two adapters and choose any of them while creating a ID3D11Device object. For quite some time I was wondering how various settings of this "switchable graphics" affect my app? Today I finally figured it out. Long story short: They just change order of these adapters as visible to my program, so that the appropriate one is visible as adapter 0. Here is the full story:

It looks like the base setting is the one that can be found in Windows Settings > Power options > edit your power plan > Switchable Dynamic Graphics. (Not to confuse with "AMD Graphics Power Settings"!) When you set it to "Optimize power savings" or "Optimize performance", application sees Intel GPU as first adapter:

When you choose "Maximize performance", application sees AMD GPU as first adapter:

I also found that Radeon Settings (the app that comes with AMD graphics driver) overrides this system setting. If you go to System > Switchable Graphics and make configuration for your specific executable, then again: choosing "Power Saving" makes your app see Intel GPU as first adapter, while choosing "High Performance" makes AMD graphics first.

It's as simple as that. Basically if you always use the first adapter you find, then you follow recommended settings of the system. You are still free to use the other adapter while creating your D3D11 device. I checked that - it works and it really uses that one.

It's especially important if you meet a strange bug where your app hangs on one of these GPUs.

Comments | #directx Share

# Memory management in Vulkan and DX12 - my talk at GDC 2018

Feb 2018

If you happen to come to this year's Game Developers Conference (GDC), I'd like to invite you to my talk: "Memory management in Vulkan and DX12". During this lecture I will not only advertise my Vulkan Memory Allocator library, but I will also show technical details, tips and tricks for GPU memory management that you can use on your own when programming using Vulkan or Direct3D 12.

List of presentations made by my colleagues at AMD can be found here: GDC 2018 Presentations - GPUOpen, and list of all GDC talks is available as Session Scheduler on

Comments | #vulkan #directx #gdc #events #teaching Share

# When integrated graphics works better

Feb 2018

In RPG games the more powerful your character is, the more tough and scary are the monsters you have to fight. I sometimes get a feeling that the same applies to real life - bugs you meet when you are a programmer. I recently blogged about the issue when QueryPerformanceCounter call takes long time. I've just met another weird problem. Here is my story:

I have Lenovo IdeaPad G50-80 (80E502ENPB) laptop. It has switchable graphics: integrated Intel i7-5500U and dedicated AMD Radeon R5 M330. Of course I used to choose AMD dedicated graphics, because it's more powerful. My application is a music visualization program. It renders graphics using Direct3D 11. It uses one ID3D11Device object and one thread for rendering, but two windows displayed on two outputs: output 1 (laptop screen) contains window with GUI and preview, while output 2 (projector connected via VGA or HDMI) shows main view using borderless, topmost window covering whole screen (but not real fullscreen as in IDXGISwapChain::​SetFullscreenState). I tend to enable V-sync on output 1 (IDXGISwapChain::​Present SyncInterval = 1) and disable it on output 0 (SyncInterval = 0). My rendering algorithm looks like this:

Loop over frames:
    Render scene to MainRenderTarget
    Render MainRenderTarget to OutputBackBuffer, covering whole screen
    Render MainRenderTarget to PreviewBackBuffer, on a quad
    Render ImGui to PreviewBackBuffer

So far I had just one problem with it: my framerate decreased over time. It used to drop very quickly after launching the app from 60 to 30 FPS and stabilize there, but after few hours it was steadily decreasing to 20 FPS or even less. I couldn't identify the reason for it in my code, like a memory leak. It seemed to be related to rendering. I could somehow live with this issue - low framerate was not that noticable.

Suddenly this Thursday, when I wanted to test new version of the program, I realized it hangs after around a minute from launching. It was a strange situation in which the app seemed to be running normally, but it was just not rendering any new frames. I could see it still works by inspecting CPU usage and thread list with Process Hacker. I could minimize its windows or cover them by other windows and they preserved their content after restoring. I even captured trace in GPUView, only to notice that the app is filling DirectX command queue and AMD GPU is working. Still, nothing was rendered.

That was a frightening situation for me, because I need to have it working for this weekend. After I checked that restarting app or the whole system doesn't help, I tried to identify the cause and fix it in various ways:

1. I thought that maybe there is just some bug in the new version of my program, so I launched the previous version - one that successfully worked before, reaching more than 10 hours of uptime. Unfortunately, the problem still occured.

2. I thought that maybe it's a bug in the new AMD graphics driver, so I downloaded and installed previous version, performing "Clean install". It didn't help either.

3. In desperation, I formatted whole hard drive and reinstalled operating system. I planned to to it anyway, because it was a 3-year-old system, upgraded from Windows 8 and I had some other problems with it (that I don't describe here because they were unrelated to graphics). I installed the latest, clean Windows 10 with latest updates and all the drivers. Even that didn't solve my problem. The program still hung soon after every launch.

I finally came up with an idea to switch my app to using Intel integrated graphics. It can be done in Radeon Settings > "Switchable Graphics" tab. In a popup menu for a specific executable, "High Performance" means choosing dedicated AMD GPU and "Power Saving" means choosing integrated Intel GPU. See article Configuring Laptop Switchable Graphics... for details.

It solved my problem! The program not only doesn't hang any longer, but it also maintains stable 60 FPS now (at least it did during my 2h test). Framerate drops only when there is a scene that blends many layers together on a FullHD output - apparently this GPU cannot keep up with drawing so many pixels per second. Anyway, this is the situation where using integrated Intel graphics turns out work better than a faster, dedicated GPU.

I still don't know what is the cause of this strange bug. Is it something in the way my app uses D3D11? Or is it a bug in graphics driver (one of the two I need to have installed)? I'd like to investigate it further when I find some time. For now, I tend to believe that:

- The only thing that might have changed recently and break my app was some Windows updated pushed by Microsoft.

- The two issues: the one that I had before with framerate decreasing over time and the new one with total image freeze are related. They may have something to do with switchable graphics - having two different GPUs in the system, both enabled at the same time. I suspect that maybe when I want to use Radeon, the outputs (or one of them) are connected to Intel anyway, so the image needs to be copied and synchronized with Intel driver.

Update 2018-02-21: Later after I published this post, I tried few other things to fix the problem. For example, I updated AMD graphics driver to latest version 18.2.2. It didn't help. Suddently, the problem disappeared as mysteriously as it appeared. It happened during a single system launch, without a restart. My application was hunging, and later it started working properly. The only thing that I can remember doing in between was downloading and launching UIforETW - a GUI tool for capturing Event Tracing for Windows (ETW) traces, like the ones for GPUView. I know that it automatically installs GPUView and other necessary tools on first launch, so that may have changed something in my system. Either way, now my program works on AMD graphics without a hang, reaching few hours of uptime and maintaining 60 FPS, which only sometimes drops to 30 FPS, but it also go back up.

Comments | #directx #gpu #windows Share

# 6th tip to understand legacy code

Jan 2018

I just watched a video published 2 days ago: 5 Tips to Understand Legacy Code by Jonathan Boccara. I like it a lot, yet I feel that something is missing here. The author gives 5 tips for how can you start figuring out a large codebase that is new to you.

  1. "Find a stronghold" - a small portion of the code (even single line) that you understand perfectly and expand from there, look at the code around it.
  2. "Analyze stacks" - put a breakpoint to capture a representative moment in the execution of the program (with someone's help) and look at call stack to understand layers of the code.
  3. "Start from I/O" - analyze the code that processes data at the very beginning (input data, e.g. source file) or at the end (output data).
  4. "Decoupling" - learn by trying to do some refactoring of the code, especially to decouple some of its parts.
  5. "Padded-room" - find a code that doesn't depend on any other (has limited scope) and go from there.

These are all great advices. I agree with them. But computer programs have two aspects: dynamic (the way the code executes over time - algorithms, functions, control flow) and static (the way data are stored in memory - data structures). I have a feeling that these 5 points focus mostly on dynamic aspect, so as an advocate of "data-oriented design" I would add another point:

6. "Core data structure": Find structures, classes, variables, enums, and other definitions that describe the most important data that the program is operating on. For example, if it's a game engine, see how objects of a 3D scene are defined (some abstract base CGameObject class), how they can relate to each other (forming a tree structure or so-called scene graph), what properties do they have (name, ID, position, size, orientation, color), what kinds of them are available (mesh, camera, light). Or if that's a compiler, look for definition of Abstract Syntax Tree (AST) node and enum with list of all opcodes. Draw UML diagram that will show what data types are defined, what member variables do they contain and how do they relate to each other (inheritance, composition). After visualizing and understanding that, it will be much easier to analyze the dynamic aspect - code of algorithms that operate on this data. Together, they form the essence of the program. All the rest are just helpers.

Comments | #software engineering Share

Older entries >


Pinboard Bookmarks


Blog Tags

[Stat] [STAT NO AD] [Download] [Dropbox] [pub] [Mirror]
Copyright © 2004-2018