# Why Not Use Heterogeneous Multi-GPU?
There was an interesting discussion recently on one Slack channel about using integrated GPU (iGPU) together with discrete GPU (dGPU). Many sound ideas were said there, so I think it's worth writing them down. But because I probably never blogged about multi-GPU before, few words of introduction first:
The idea to use multiple GPUs in one program is not new, but not very widespread either. In old graphics APIs like Direct3D 11 it wasn't easy to implement. Doing it right in a complex game often involved engaging driver engineers from the GPU manufacturer (like AMD, NVIDIA) or using custom vendor extensions (like AMD GPU Services - see for example Explicit Crossfire API).
New generation of graphics APIs – Direct3D 12 and Vulkan – are lower level, give more direct access to the hardware. This includes the possibility to implement multi-GPU support on your own. There are two modes of operation. If the GPUs are identical (e.g. two graphics cards of the same model plugged to the motherboard), you can use them as one device object. In D3D12 you then index them as Node 0, Node 1, ... and specify
NodeMask bit mask when allocating GPU memory, submitting commands and doing all sorts of GPU things. Similarly, in Vulkan you have VK_KHR_device_group extension available that allows you to create one logical device object that will use multiple physical devices.
But this post is about heterogeneous/asymmetric multi-GPU, where there are two different GPUs installed in the system, e.g. one integrated with the CPU and one discrete. A common example is a laptop with "switchable graphics", which may have an Intel CPU with their integrated “HD” graphics plus a NVIDIA GPU. There may even be two different GPUs from the same manufacturer! My new laptop (ASUS TUF Gaming FX505DY) has AMD Radeon Vega 8 + Radeon RX 560X. Another example is a desktop PC with CPU-integrated graphics and a discrete graphics card installed. Such combination may still be used by a single app, but to do that, you must create and use two separate Device objects. But whether you could, doesn't mean you should…
First question is: Are there games that support this technique? Probably few… There is just one example I heard of: Ashes of the Singularity by Oxide Games, and it was many years ago, when DX12 was still fresh. Other than that, there are mostly tech demos, e.g. "WITCH CHAPTER 0 [cry]" by Square Enix as described on DirectX Developer Blog (also 5 years old).
iGPU typically has lower computational power than dGPU. It could accelerate some pieces of computations needed each frame. One idea is to hand over the already rendered 3D scene to the iGPU so it can finish it with screen-space postprocessing effects and present it, which sounds even better if the display is connected to iGPU. Another option is to accelerate some computations, like occlusion culling, particles, or water simulation. There are some excellent learning materials about this technique. The best one I can think of is: Multi-Adapter with Integrated and Discrete GPUs by Allen Hux (Intel), GDC 2020.
However, there are many drawbacks of this technique, which were discussed in the Slack chat I mentioned:
Conclusion: Supporting heterogeneous multi-GPU in a game engine sounds like an interesting technical challenge, but better think twice before doing it in a production code.
BTW If you just want to use just one GPU and worry about the selection of the right one, see my old post: Switchable graphics versus D3D11 adapters.
# How to Disable Notification Sound in Messenger for Android?
Applications and websites fight for our attention. We want to stay connected and informed, but too many interruptions are not good for our productivity or mental health. Different applications have different settings dedicated to silencing notifications. I recently bought a new smartphone and so I needed to install and configure all the apps (which is a big task these days, same way as it always used to be with Windows PC after "format C:" and system reinstall).
Facebook Messenger for Android offers an on/off setting for all the notifications, and a choice of the sound of a notification and an incoming call. Unfortunately, it doesn't offer an option to silence the sound. You can only either choose among several different sound effects or disable all notifications of the app entirely. What if you want to keep notifications active so they appear in the Android drawer, use vibration, get sent to a smart band, and you can hear incoming calls ringing, you just want to mute the sound of incoming messages?
Here is the solution I found. It turns out you can upload a custom sound file to your smartphone and use it. For that I generated a small WAV file - 0.1 seconds of total silence. 1) You can download it from here:
Silence_100ms.wav (8.65 KB)
2) Now you need to put it into a specific directory in the memory of your smartphone, called "Notifications". To do this, you need to use an app that allows to freely manipulate files and directories, as opposed to just looking for specific content as image or music players do. If you downloaded the file directly to your smartphone, use free Total Commander to move this file to the "Notifications" directory. If you have it on your PC, MyPhoneExplorer will be a good app to connect to your phone using a USB cable or WiFi network and transfer the file.
3) Finally, you need to select the file in Messenger. To do this, go to its settings > Notifications & Sounds > Notification Sound. The new file "Silence_100ms" should appear mixed with the list of default sound effects. After choosing it, your message notifications in Messenger will be silent.
There is one downside of this method. While not audible, the sound is still playing on every incoming message, so if you listen to music e.g. using Spotify, the music will fade out for a second every time the sound is played.
# Avoid double negation, unless...
Boolean algebra is a branch of mathematics frequently used in computer science. It deals with simple operations like AND, OR, NOT - which we also use in natural language.
In programming, two negations of a boolean variable cancel each other. You could express it in C++ as:
!!x == x
In natural languages, it's not the case. In English we don't use double negatives (unless it's intended, like in the famous song quote "We don't need no education" :) In Polish double negatives are used but don't result in a positive statement. For example, we say "Wegetarianie nie jedzą żadnego rodzaju mięsa.", which literally translates as "Vegetarians don't eat no kind of meat."
No matter what your native language is, in programming it's good to simplify things and so to avoid double negations. For example, if you design an API for your library and you need to name a configuration setting about an optional optimization Foo, it's better to call it "FooOptimizationEnabled". You can then just check if it's true and if so, you do the optimization. If the setting was called "FooOptimizationDisabled", its documentation could say: "When the FooOptimizationDisabled setting is disabled, then the optimization is enabled." - which sounds confusing and we don't want that.
But there are two cases where negative flags are justified in the design of an API. Let me give examples from the world of graphics.
1. When enabled should be the default. It's easier to set flags to 0 or to a minimum set of required flags rather than thinking about each available flag whether or not it should be set, so a setting that most users will need enabled and disable only on special occasions could be a negative flag. For example, Direct3D 12 has
D3D12_HEAP_FLAG_DENY_BUFFERS. A heap you create can host any GPU resources by default (buffers and textures), but when you use this flag, you declare you will not use it for buffers. (Note to graphics programmers: I know this is not the best example, because usage of these flags is actually required on Resource Heap Tier 1 and there are also "ALLOW" flags, but I hope you get the idea.)
2. For backward compatibility. If something has always been enabled in previous versions and you want to give users a possibility to disable it in the new version of your library, you don't want to break their old code or ask them to update their code everywhere adding the new "enable" flag, so it's better to add a negative flag that will disable the otherwise still enabled feature. That's what Vulkan does with
VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT. There was no such flag in Vulkan 1.0 - pipelines were always optimized. The latest Vulkan 1.2 has it so you can ask to disable optimization, which may speed up the creation of the pipeline. All the existing code that doesn't use this flag continues to have its pipelines optimized.
# On Debug, Release, and Other Project Configurations
Foreword: I was going to write a post about
#pragma optimize in Visual Studio, which I learned recently, but later I decided to describe the whole topic more broadly. As a result, this blog post can be useful or inspiring to every programmer coding in C++ or even in other languages, although I give examples based on just C++ as used in Microsoft Visual Studio on Windows.
When we compile a program, we often need to choose one of possible "configurations". Visual Studio creates two of those for a new C++ project, called "Debug" and "Release". As their names imply, the first one is mostly intended to be used during development and debugging, while the other should be used to generate the final binary of the program to be distributed to the external users. But there is more to it. Each of these configurations actually sets multiple parameters of the project and you can change them. You can also define your custom configurations and have over 2 of them, which can be very useful, as you will see later.
First, let's think about the specific settings that are defined by a project configuration. They can be divided into two broad categories. First one is all the parameters that control the compiler and linker. The difference between Debug and Release is mostly regarding optimizations. Debug configuration is all about having the optimizations disabled (which allows full debugging functionality and also makes the compilation time short), while Release has the optimizations enabled (which obviously makes the program run faster). For example, Visual Studio sets these options in Release:
Visual Studio also inserts an additional code in Debug configuration to fill memory with some bit pattern that helps with debugging low-level memory access errors, which are plaguing C and C++ programmers. For example, seeing 0xCCCCCCCC in the debugger usually means uninitialized memory on the stack, 0xCDCDCDCD - allocated but uninitialized memory on the heap, and 0xFEEEFEEE - memory that was already freed and should no longer be used. In Release, memory under such incorrectly used pointers will just hold its previous data.
The second category of things controlled by project configurations are specific features inside the code. In case of C and C++ these are usually enabled and disabled using preprocessor macros, like
#if. Such macros can not only be defined inside the code using
#define, but also passed from the outside, among the parameters of the compiler, and so they can be set and changed depending on the project configuration.
The features controlled by such macros can be very diverse. Probably the most canonical example is the standard
assert macro (or your custom equivalent), which we define to some error logging, instruction to break into the debugger, or even complete program termination in Debug config, and to an empty macro in Release. In case of C++ in Visual Studio, the macro defined in Debug is
_DEBUG, in Release -
NDEBUG, and depending on the latter, standard macro
assert is doing "something" or is just ignored.
There are more possibilities. Depending on these standard pre-defined macros or your custom ones, you can cut out different functionalities from the code. One example is any instrumentation that lets you analyze and profile its execution (like calls to Tracy). You probably don't want it in the final client build. Same with detailed logging functionality, any hidden developer setting or cheat codes (in case of games). On the other hand, you may want to include in the final build something that's not needed during development, like checking user's license, some anti-piracy or anti-cheat protection, and generation of certificates needed for the program to work on non-developer machines.
As you can see, there are many options to consider. Sometimes it can make sense to have over 2 project configurations. Probably the most common case is a need for a "Profile" configuration that allows to measure the performance accurately - has all the compiler optimizations enabled, but still keeps the instrumentation needed for profiling in the code. Another idea would be to wrap the super low level, frequently called checks like
(index < size()) inside
vector::operator into some separate macro called
HEAVY_ASSERT and have some configuration called "SuperDebug" that we know works very slowly, but has all those checks enabled. On the other end, remember that the "FinalFinal" configuration that you will use to generate the final binary for the users should be build and tested in your Continuous Integration during development, not only one week before the release date. Bugs that occur in only one configuration and not in the others are not uncommon!
Some bugs just don't happen in Debug, e.g. due to uninitialized memory containing consistent 0xCCCCCCCC instead of garbage data, or a race condition between threads not occurring because of a different time it takes to execute certain functions. In some projects, the Debug configuration works so slowly that it's not even possible to test the program on a real, large data set in this configuration. I consider it a bad coding practice and I think it shouldn't happen, but it happens quite often, especially when STL is used, where every reference to
myVector[i] element in the unoptimized code is a function call with a range check instead of just pointer dereferecence. In any case, sometimes we need to investigate bugs occurring in Release configuration. Not all hope is lost then, because in Visual Studio the debugger still works, just not as reliably as in Debug. Because of optimizations made by the compiler, the instruction pointer (yellow arrow) may jump across the code inconsistently, and some variables may be impossible to preview.
Here comes the trick that inspired me to write this whole blog post. I recently learned that there is this custom Microsoft preprocessor macro:
#pragma optimize("", off)
that if you put at the beginning of a .cpp file or just before your function of interest, disables all the compiler optimizations from this point until the end of the file, making its debugging nice and smooth, while the rest of the program behaves as before. (See also its documentation.) A nice trick!
# Improving the quality of the alpha test (cutout) materials
This is a guest post from my friend Łukasz Izdebski Ph.D.
Today I want to share with you a trick which my collage from previews work mentioned to me a long time ago. It's about alpha tested (also known as cutout) materials. This technique which I want to share with you consists of two neat tricks that can improve the quality of alpha tested (cutout) materials.
Alpha test is an old technique used in computer graphics. The idea behind it is very simple. In a very basic form, a material (shader) of a rendered object can discard processed pixels based on the alpha channel of RGBA texture. When shaded pixel’s final alpha value is less than this threshold value (threshold value is constant for the instance of the material and a typical value is 50%), it is clipped (discarded) and will not land in the shaders output framebuffer. These types of materials are commonly used to render vegetation, fences, impostors/billboards, etc.
Alpha tested materials have some, I will say a little issue. It can be noticed when rendered object (with this material) is far away from the camera. Let the following video below be an example of this issue.
# Secrets of Direct3D 12: Resource Alignment
In the new graphics APIs - Direct3D 12 and Vulkan - creation of resources (textures and buffers) is a multi-step process. You need to allocate some memory and place your resource in it. In D3D12 there is a convenient function
ID3D12Device::CreateCommittedResource that lets you do it in one go, allocating the resource with its own, implicit memory heap, but it's recommended to allocate bigger memory blocks and place multiple resources in them using
When placing a resource in the memory, you need to know and respect its required size and alignment. Size is basically the number of bytes that the resource needs. Alignment is a power-of-two number which the offset of the beginning of the resource must be multiply of (
offset % alignment == 0). I'm thinking about writing a separate article for beginners explaining the concept of memory alignment, but that's a separate topic...
Back to graphics, in Vulkan you first need to create your resource (e.g.
vkCreateBuffer) and then pass it to the function (e.g.
vkGetBufferMemoryRequirements) that will return required size of alignment of this resource (
alignment). In DirectX 12 it looks similar at first glance or even simpler, as it's enough to have a structure
D3D12_RESOURCE_DESC describing the resource you will create to call
ID3D12Device::GetResourceAllocationInfo and get
D3D12_RESOURCE_ALLOCATION_INFO - a similar structure with
Alignment. I've described it briefly in my article Differences in memory management between Direct3D 12 and Vulkan.
But if you dig deeper, there is more to it. While using the mentioned function is enough to make your program working correctly, applying some additional knowledge may let you save some memory, so read on if you want to make your GPU memory allocator better. First interesting information is that alignments in D3D12, unlike in Vulkan, are really fixed constants, independent of a particular GPU or graphics driver that the user may have installed.
So, we have these constants and we also have a function to query for actual alignment. To make things even more complicated, structure
Alignment member, so you have one alignment on the input, another one on the output! Fortunately,
GetResourceAllocationInfo function allows to set
D3D12_RESOURCE_DESC::Alignment to 0, which causes default alignment for the resource to be returned.
Now, let me introduce the concept of "small textures". It turns out that some textures can be aligned 4 KB and some MSAA textures can be aligned to 64 KB. They call this "small" alignment (as opposed to "default" alignment) and there are also constants for it:
|Texture||64 KB||4 KB|
|MSAA texture||4 MB||64 KB|
Using this smaller alignment allows to save some GPU memory that would otherwise be unused as padding between resources. Unfortunately, it's unavailable for buffers and available only for small textures, with a very convoluted definition of "small". The rules are hidden in the description of Alignment member of D3D12_RESOURCE_DESC structure:
GetResourceAllocationInfo calculate all this automatically and just return optimal alignment for a resource, like Vulkan function does? Possibly, but this is not what happens. You have to ask for it explicitly. When you pass
D3D12_RESOURCE_DESC::Alignment = 0 on the input, you always get the default (larger) alignment on the output. Only when you set
D3D12_RESOURCE_DESC::Alignment to the small alignment value, this function returns the same value if the small alignment has been "granted".
There are two ways to use it in practice. First one is to calculate the eligibility of a texture to use small alignment on your own and pass it to the function only when you know the texture fulfills the conditions. Second is to try the small alignment always. When it's not granted,
GetResourceAllocationInfo returns some values other than expected (in my test it's
Alignment = 64 KB and
SizeInBytes = 0xFFFFFFFFFFFFFFFF). Then you should call it again with the default alignment. That's the method that Microsoft shows in their "Small Resources Sample". It looks good, but a problem with it is that calling this function with an alignment that is not accepted generates D3D12 Debug Layer error #721 CREATERESOURCE_INVALIDALIGNMENT. Or at least it used to, because on one of my machines the error no longer occurs. Maybe Microsoft fixed it in some recent update of Windows or Visual Studio / Windows SDK?
Here comes the last quirk of this whole D3D12 resource alignment topic: Alignment is applied to offset used in
CreatePlacedResource, which we understand as relative to the beginning of an
ID3D12Heap, but the heap itself has an alignment too!
D3D12_HEAP_DESC structure has
Alignment member. There is no equivalent of this in Vulkan. Documentation of D3D12_HEAP_DESC structure says it must be 64 KB or 4 MB. Whenever you predict you might create an MSAA textures in a heap, you must choose 4 MB. Otherwise, you can use 64 KB.
Thank you, Microsoft, for making this so complicated! ;) This article wouldn't be complete without the advertisement of open source library: D3D12 Memory Allocator (and similar Vulkan Memory Allocator), which automatically handles all this complexity. It also implements both ways of using small alignment, selectable using a preprocessor macro.
# Links to GDC 2020 Talks and More
March is an important time of year for game developers, as that's when Game Developers Conference (GDC) takes place - the most important conference of the industry. This year's edition has been cancelled because of coronavirus pandemic, just like all other events, or rather postponed to a later date. But many companies prepared their talks anyway. Surely, they had to submit their talks long time ago, plus any preparation, internal technical and legal reviews... The time spent on this shouldn't be wasted. That's why many of them shared their talks online as videos and/or slides. Below, I try to gather links to these materials with a full list of titles, with special focus on programming talks.
They organized multi-day "Virtual Talks" event presented on Twitch, with replays now available to watch and slides accessible on their website.
Monday, March 16
The 'Kine' Postmortem
Storytelling with Verbs: Integrating Gameplay and Narrative
Intrinsically Motivated Teams: The Manager's Toolbox
From 'Assassin's Creed' to 'The Dark Eye': The Importance of Themes
Representing LGBT+ Characters in Games: Two Case Studies
The Sound of Anthem
Is Your Game Cross-Platform Ready?
Forgiveness Mechanics: Reading Minds for Responsive Gameplay
Experimental AI Lightning Talk: Hyper Realistic Artificial Voices for Games
Tuesday, March 17
What to Write So People Buy: Selling Your Game Without Feeling Sleazy
Failure Workshop: FutureGrind: How To Make A 6-Month Game In Only 4.5 Years
Stress-Free Game Development: Powering Up Your Studio With DevOps
Baked in Accessibility: How Features Were Approached in 'Borderlands 3'
Matchmaking for Engagement: Lessons from 'Halo 5'
Forget CPI: Dynamic Mobile Marketing
Integrating Sound Healing Methodologies Into Your Workflow
From 0-1000: A Test Driven Approach to Tools Development
Overcoming Creative Block on 'Super Crush KO'
When Film, Games, and Theatre Collide
Wednesday, March 18
Bringing Replays to 'World of Tanks: Mercenaries'
Developing and Running Neural Audio in Constrained Environments
Mental Health State of the Industry: Past, Present & Future
Empathizing with Steam: How People Shop for Your Game
Scaling to 10 Concurrent Users: Online Infrastructure as an Indie
Crafting A Tiny Open World: 'A Short Hike' Postmortem
Indie Soapbox: UI design is fun!
Don't Ship a Product, Ship Value: Start Your Minimum Viable Product With a Solution
Day of the Devs: GDC Edition Direct
Independent Games Festival & Game Developers Choice Awards
Thursday, March 19
Machine Learning for Optimal Matchmaking
Skill Progression, Visual Attention, and Efficiently Getting Good at Esports
Making Your Game Influencer Ready: A Marketing Wishlist for Developers
How to Run Your Own Career Fair on a Tiny Budget
Making a Healthy Social Impact in Commercial Games
'Forza' Monthly: Live Streaming a Franchise
Aesthetic Driven Development: Choosing Your Art Before Making a Game
Reading the Rules of 'Baba Is You'
Friday, March 20
Beyond Games as a Service with Live Ops
Kill the Hero, Save the (Narrative) World
'Void Bastards' Art Style Origin Story
Writing Tools Faster: Design Decisions to Accelerate Tool Development
Face-to-Parameter Translation via Neural Network Renderer
The Forest Paths Method for Accessible Narrative Design
'Gears 5' Real-Time Character Dynamics
Stop & Think: Teaching Players About Media Manipulation in 'Headliner'
They organized "DirectX Developer Day" where they announced DirectX 12 Ultimate - a fancy name for the updated Direc3D 12_2 with new major features including DXR (Ray Tracing), Variable Rate Shading, and Mesh Shaders.
DXR 1.1 Inline Raytracing
Advanced Mesh Shaders
Reinventing the Geometry Pipeline: Mesh Shaders in DirectX 12
DirectX 12 Sampler Feedback
PIX on Windows
That's actually GPU Technology Conference (GTC) - a separate event. Their biggest announcement this month was probably DLSS 2.0.
RTX-Accelerated Hair Brought to Life with NVIDIA Iray
Material Interoperability Using MaterialX, Standard Surface, and MDL
The Future of GPU Raytracing
Visuals as a Service (VaaS): How Amazon and Others Create and Use Photoreal On-Demand Product Visuals with RTX Real-Time Raytracing and the Cloud
Next-Gen Rendering Technology at Pixar
New Features in OptiX 7
Production-Quality, Final-Frame Rendering on the GPU
Latest Advancements for Production Rendering with V-Ray GPU and Real-Time Raytracing with Project Lavina
Accelerated Light-Transport Simulation using Neural Networks
Bringing the Arnold Renderer to the GPU
Supercharging Adobe Dimension with RTX-Enabled GPU Raytracing
Sharing Physically Based Materials Between Renderers with MDL
Real-Time Ray-Traced Ambient Occlusion of Complex Scenes using Spatial Hashing
I also found some other videos on Google:
DLSS - Image Reconstruction for Real-time Rendering with Deep Learning
NVIDIA Vulkan Features Update – including Vulkan 1.2 and Ray Tracing
3D Deep Learning in Function Space
Unleash Computer Vision at the Edge with Jetson Nano and Always AI
Optimized Image Classification on the Cheap
Cisco and Patriot One Technologies Bring Machine Learning Projects from Imagination to Realization (Presented by Cisco)
AI @ The Network Edge
Animation, Segmentation, and Statistical Modeling of Biological Cells Using Microscopy Imaging and GPU Compute
Improving CNN Performance with Spatial Context
Weakly Supervised Training to Achieve 99% Accuracy for Retail Asset Protection
Combating Problems Like Asteroid Detection, Climate Change, Security, and Disaster Recovery with GPU-Accelerated AI
Condensa: A Programming System for DNN Model Compression
AI/ML with vGPU on Openstack or RHV Using Kubernetes
CTR Inference Optimization on GPU
NVIDIA Tools to Train, Build, and Deploy Intelligent Vision Applications at the Edge
Leveraging NVIDIA’s Technology for the Ultimate Industrial Autonomous Transport Robot
How to Create Generalizable AI?
Isaac Sim 2020 Deep Dive
But somehow I can't find their full list with links to them anywhere on their website. More talks are accessible after free registration on the event website.
Multi-Adapter with Integrated and Discrete GPUs
Optimizing World of Tanks*: from Laptops to High-End PCs
Intel® oneAPI Rendering Toolkit and its Application to Games
Intel® ISPC in Unreal Engine 4: A Peek Behind the Curtain
Variable Rate Shading with Depth of Field
For the Alliance! World of Warcraft and Intel discuss an Optimized Azeroth
Intel® Open Image Denoise in Blender - GDC 2020
Variable Rate Shading Tier 1 with Microsoft DirectX* 12 from Theory to Practice
Does Your Game's Performance Spark Joy? Profiling with Intel® Graphics Performance Analyzers
Boost CPU performance with Intel® VTune Profiler
DeepMotion | Optimize CPU Performance with Intel VTune Profiler
Google for Games Developer Summit 2020 @ YouTube (a collection of playlists)
Google for Games Developer Summit Keynote
What's new in Android game development tools
What's new in Android graphics optimization tools
Android memory tools and best practices
Deliver higher quality games on more devices
Google Play Asset Delivery for games: Product deep dive and case studies
Protect your game's integrity on Google Play
Accelerate your business growth with leading ad strategies
Firebase games SDK news
Cloud Firestore for Game Developers
Google for Games Developer Summit Keynote
Scaling globally with Game Servers and Agones (Google Games Dev Summit)
How to make multiplayer matchmaking easier and scalable with Open Match (Google Games Dev Summit)
Unity Game Simulation: Find the perfect balance with Unity and GCP (Google Games Dev Summit)
How Dragon Quest Walk handled millions of players using Cloud Spanner (Google Games Dev Summit)
Building gaming analytics online services with Google Cloud and Improbable (Google Games Dev Summit)
Google for Games Developer Summit Keynote
Bringing Destiny to Stadia: A postmortem (Google Games Dev Summit)
Stadia Games & Entertainment presents: Creating for content creators (Google Games Dev Summit)
Empowering game developers with Stadia R&D (Google Games Dev Summit)
Stadia Games & Entertainment presents: Keys to a great game pitch (Google Games Dev Summit)
Supercharging discoverability with Stadia (Google Games Dev Summit)
Online Game Technology Summit: Start-And-Discard: A Unified Workflow for Development and Live
Finding Space for Sound: Environmental Acoustics
Game Server Performance
NPC Voice Design
Machine Learning Summit: Ragdoll Motion Matching
Machine Learning, Physics Simulation, Kolmogorov Complexity, and Squishy Bunnies
AMD: No information.
Sony: No information.
Consoles: Last but not least, March 2020 was also the time when the details of the upcoming new generation of consoles have been announced - Xbox Series S and PlayStation 5. You can easily find information about them by searching the Internet, so I won't recommend any links.
If you know about any more GDC 2020 or other important talks related to programming that have been released recently, please contact me or leave a comment below and I will add them!
Maybe there a positive side of this pandemic? With GDC taking place, developers had to pay $1000+ entrance fee for the event. They had to book a flight to California and a hotel in San Francisco, which was prohibitively expensive for many. They had to apply for ESTA or a vista to the US, which not everyone could get. And the talks eventually landed behind a paywall, scoring even more money to the organizers. Now we can educate ourselves for free from the safety and convenience of our offices and homes.
# Initializing DX12 Textures After Allocation and Aliasing
If you are a graphics programmer using Direct3D 12, you may wonder what's the initial content of a newly allocated buffer or texture. Microsoft admitted it was not clearly defined, but in practice such new memory is filled with zeros (unless you use the new flag
D3D12_HEAP_FLAG_CREATE_NOT_ZEROED). See article “Coming to DirectX 12: More control over memory allocation”. This behavior has its pros and cons. Clearing all new memory makes sense, as the operating system surely doesn't want to disclose to us the data left by some other process, possibly containing passwords or other sensitive information. However, writing to a long memory region takes lots of time. Maybe that's one reason GPU memory allocation is so slow. I've seen large allocations taking even hundreds of milliseconds.
There are situations when the memory of your new buffer or texture is not zero, but may contain some random data. First case is when you create a resource using CreatePlacedResource function, inside a memory block that you might have used before for some other, already released resources. That's also what D3D12 Memory Allocator library does by default.
It is important to know that in this case you must initialize the resource in a specific way! The rules are described on page: “ID3D12Device::CreatePlacedResource method” and say: If your resource is a texture that has either
DEPTH_STENCIL flag, you must initialize it after allocation and before any other usage using one of those methods:
Please note that rendering to the texture as a Render Target or writing to it as an Unordered Access View is not on the list! It means that, for example, if you implement a postprocessing effect, you allocated an intermediate 1920x1080 texture, and you want to overwrite all its pixels by rendering a fullscreen quad or triangle (better to use one triangle - see article "GCN Execution Patterns in Full Screen Passes"), then initializing the texture before your draw call seems redundant, but you still need to do it.
What happens if you don't? Why are we asked to perform this initialization? Wouldn't we just see random colorful pixels if we use an uninitialized texture, which may or may not be a problem, depending on our use case? Not really... As I explained in my previous post “Texture Compression: What Can It Mean?”, a texture may be stored in video memory in some vendor-specific, compressed format. If the metadata of such compression are uninitialized, it might have consequences more severe than observing random colors. It's actually an undefined behavior. On one GPU everything may work fine, while on the other you may see graphical corruptions that even rendering to the texture as a Render Target cannot fix (or a total GPU crash maybe?) I've experienced this problem myself recently.
Thinking in terms of internal GPU texture compression also helps to explain why is this initialization required only for render-target and depth-stencil textures. GPUs use more aggressive compression techniques for those. Having the requirements for initialization defined like that implies that you can leave buffers and other textures uninitialized and just experience random data in their content without the danger of anything worse happening.
I feel that a side note on
ID3D12GraphicsCommandList::DiscardResource function is needed, because many of you probably don't know it. Contrary to its name, this function doesn't release a resource or its memory. The meaning of this function is more like the mapping flag
D3D11_MAP_WRITE_DISCARD from the old D3D11. It informs the driver that the current content of the resource might be garbage; we know about it, and we don't care, we don't need it, not going to read it, just going to fill the entire resource with a new content. Sometimes, calling this function may let the driver reach better performance. For example, it may skip downloading previous data from VRAM to the graphics chip. This is especially important and beneficial on tile-based, mobile GPUs. In some other cases, like the initialization of a newly allocated texture described here, it is required. Inside of it, driver might for example clear the metadata of its internal compression format. It is correct to call
DiscardResource and then render to your new texture as a Render Target. It could also be potentially faster than doing
ClearRenderTargetView instead of
DiscardResource. By the way, if you happen to use Vulkan and still read that far, you might find it useful to know that the Vulkan equivalent of
DiscardResource is an image memory barrier with
oldLayout = VK_IMAGE_LAYOUT_UNDEFINED.
There is a second case when a resource may contain some random data. It happens when you use memory aliasing. This technique allows to save GPU memory by creating multiple resources in the same or overlapping region of a
ID3D12Heap. It was not possible in old APIs (Direct3D 11, OpenGL) where each resource got its own implicit memory allocation. In Direct3D you can use
CreatePlacedResource to put your resource in a specific heap, at a specific offset. It's not allowed to use aliasing resources at the same time. Sometimes you need some intermediate buffers or render targets only for a specific, short time during each frame. You can then reuse their memory for different resources needed in later part of the frame. That's the key idea of aliasing.
To do it correctly, you must do two things. First, between the usages you must issue a barrier of special type
D3D12_RESOURCE_BARRIER_TYPE_ALIASING. Second, the resource to be used next (also called "ResourceAfter", as opposed to "ResourceBefore") needs to be initialized. The idea is like what I described before. You can find the rules of this initialization on page “Memory Aliasing and Data Inheritance”. This time however we are told to initialize every texture that has
DEPTH_STENCIL flag with 1. a clear or 2. a copy operation to an entire subresource.
DiscardResource is not allowed. Whether it's an omission or intentional, we have to stick to these rules, even if we feel such clears are redundant and will slow down our rendering. Otherwise we may experience hard to find bugs on some GPUs.
Update 2020-07-14: An engineer from Microsoft told me that the lack of
DiscardResource among valid methods of initializing a texture after aliasing is probably a docs oversight and it is correct to initialize it this way.
.net algorithms books c++ career commonlib competitions compo demoscene directx engine events flash fractals gallery games ggj gpu graphics gui hardware homepage humor ideas igk java libraries life linux literature math mobile music networking optimization origami philosophy photography politics productions psytrance redmond rendering scripts shopping software software engineering stl studies teaching tools video visual studio vj vulkan warsztat web webdev winapi windows