All Blog Entries

All blog entries, ordered from most recent. Entry count: 1169.

Pages: 1 2 3 ... 147

# What does software have to do with the linen industry?

Thu
28
Mar 2024

Technological advancements don't come out of nowhere. They are a sum of many small steps. Video games added interactivity to films displayed on a screen, so we call them "video games". Film, in turn, is a successor of theater, which dates back to the ancient times. No wonder that when we make modern games e.g. using Unreal Engine, we use concepts from the theater and film, like an "actor", "scene", "camera", "light".

What inspired me to write this blog post was my visit at Linen Industry Museum in Żyrardów, a city located not far from Warsaw, Poland. Formerly an industrial hub, Żyrardów now houses a museum dedicated to showcasing the rich history of the linen industry through a collection of preserved machinery and artifacts.

Probably the most interesting for us programmers is the Jacquard machine, which used punched cards to program the pattern to be created on a textile. Punched cards like this later became the medium of storing programs for the first computers. For that machine, it wasn't yet programming in terms of Turing-completeness, but it surely was a digital code that controlled the machine.

It should be no surprise that in modern computer science, we use the term "thread", which comes from the textile industry. Nvidia also uses the term "warp", which is another word from that industry. We can think of a modern GPU as a machine like this one below. There are a lot of threads running in parallel. Each thread produces one pixel of a certain color. Our role as graphics programmers is to make this machine run fast, with no jams, and make sure the correct pattern is produced on the fabric on the computer screen.

So many threads! 😀

(All photos are taken by me in the beforementioned Linen Industry Museum in Żyrardów. If you happen to drive around Warsaw in Poland, make sure to visit it!)

Comments | #history Share

# 20 years of my blog

Tue
13
Feb 2024

Believe it or not, today marks the 20th anniversary of my blog. Exactly on February 13th, 2004, I've published my first entry, which is still online: "Nareszcie w Sieci". It was Friday 13th, but I said there that I don't believe in bad luck, and apparently I was right. Today, I would like to take this opportunity to look back and to write a few words about this website.

This wasn't my first or last venture on the Internet. Even before I launched this page, together with my friends from the neighborhood in my home town Częstochowa, still as teenagers, we formed a group that we called "ProgrameX". We had a website where we published various applications, games, articles, and art. We even created and shared custom Windows cursors and screensavers. I mentioned it in my past article: "Internet in Poland - My History". By the way, we all ended up earning M.Sc. degrees in computer science and now work in the IT field. Greetings to Grzesiek, Damian, and Tomek! I was also actively involved in the Polish Internet game developers community known as "Warsztat" (eng. "Workshop"), and over the years I became a moderator and then an administrator of it. That website doesn't exist anymore. Its last address was warsztat.gd.

At first, I was blogging in Polish, as I didn't feel confident writing in English. Only in June 2009 I officially switched to English. Over these 20 years, I gathered more than a thousand of entries. This one is 1168th. I know I could be ashamed of the old ones and I should remove them, their links and images probably don't work any more, but I still keep them, because I think some of them may provide useful knowledge. I like educating people and I know there are always more beginners than advanced programmers in each area, so what now seems obvious to me may be an interesting and new finding for someone else. I've only included a disclaimer for older entries, acknowledging that they may not reflect my current knowledge and beliefs.

Read full post »

Comments | #history Share

# Calculating checksums of multiple files in PowerShell

Sat
27
Jan 2024

Today I would like to share with you a small script that I developed some time ago and I use regularly since then. It calculates hashes (checksums) for multiple files and saves them to a text file. It is written in PowerShell.

A bit of background: While working with games, I often need to move large amounts of data. Packages of 150 GB or more are not uncommon. When copying, uploading, downloading them, how to make sure not a single bit has changed? A solution is obviously to calculate some checksum and compare it between the source and the destination location. If the checksums don't match, it would be beneficial to avoid transferring the entire package again. Thus, packing it into multiple files (a multi-part .7z archive) is a good idea. This, however, requires a convenient way to calculate checksums of multiple files at once.

The script

My script is actually just a single line:

$ExtractName = @{l='Name';e={Split-Path $_.Path -Leaf}}; Get-FileHash -Path INPUT_MASK | Select-Object -Property Hash, $ExtractName > OUTPUT_FILE

To use it:

Open PowerShell console.
Go to the directory with your archives to hash.
Paste the command provided above. Before pressing ENTER:
1. Replace "INPUT_MASK" with the mask of your files to hash. For example, if the archive files are named "Archive.7z.001", "Archive.7z.002", etc., you can type in "Archive.7z.*".
2. Replace "OUTPUT_FILE" with the name or path of the output file to be created.
Hit ENTER.

Example PowerShell session:

PS C:\Users\Adam Sawicki> cd E:\tmp\checksum_test\
PS E:\tmp\checksum_test> $ExtractName = @{l='Name';e={Split-Path $_.Path -Leaf}}; Get-FileHash -Path Archive.7z.* | Select-Object -Property Hash, $ExtractName > Checksums.txt
PS E:\tmp\checksum_test>

If input files are large, it may take few minutes to execute. After it is complete, the output file "Checksums.txt" may look like this:

Hash                                                             Name          
----                                                             ----          
CBBABFB5529ACFB6AD67502F37444B9273A9B5BB7AF70EFA0FF1F1EC99B70895 Archive.7z.001
185D73ECBCECB9302981C97D0DDFC4B96198103436F23DB593EA9BAFBF997DAC Archive.7z.002
086640842CC34114B898D2E19270DCE427AC89D64BCD9E8E3D8D955D69588402 Archive.7z.003
BE536C66854530236DA924B1CAED44D0880D28AAA66420F6EBE5F363435BEB4F Archive.7z.004

You can then execute the same script on the destination machine of your transfer and compare files and checksums to make sure they match.

Read full post »

Comments | #powershell Share

# How to programmatically check graphics driver version

Sat
16
Dec 2023

This article is for you if you are a graphics programmer who develops for Windows using Direct3D 11, 12, or Vulkan, and you want to fetch the version of the graphics driver currently installed in your user's system programmatically. If you are in a hurry, you can jump straight to the recommended solution in section "DXGI way" below. However, because this topic is non-trivial, I invite you to read the entire article, where I explain it comprehensively.

Read full post »

Comments | #rendering #directx #vulkan #windows #winapi Share

# Secrets of Direct3D 12: Do RTV and DSV descriptors make any sense?

Sun
12
Nov 2023

This article is intended for programmers who use Direct3D 12. We will explore the topic of descriptors, especially Render Target View (RTV) and Depth Stencil View (DSV) descriptors. To understand the article, you should already know what they are and how to use them. For learning the basics, I recommend my earlier article “Direct3D 12: Long Way to Access Data” where I described resource binding model in D3D12. Current article is somewhat a follow-up to that one. I also recommend checking the official “D3D12 Resource Binding Functional Spec”.

Descriptors in general

What is a “descriptor”? My personal definition would be that generally in computing, a descriptor is a small data structure that points to some larger data and describes its parameters. While a “pointer”, “identifier”, or “key” is typically just a single number that points or identifies the main object, a “descriptor” is typically a structure that also carries some parameters describing the object.

Descriptors in D3D12

Descriptors in D3D12 are also called “views”. They mean the same thing. Functions like ID3D12Device::CreateShaderResourceView or CreateRenderTargetView setup a descriptor. Note this is different from Vulkan, where a “view” and a “descriptor” are different entities. The concept of “view” is also present in relational databases. Just like in databases, a “view” points to the target data, but also specifies a way to look at them. In D3D12 it means, for example, that an SRV descriptor pointing to a texture can reinterpret its pixel format (e.g. with or without _SRGB), limit access to only selected range of mip levels or array slices.

Let’s talk about Constant Buffer View (CBV), Shader Resource View (SRV), or Unordered Access View (UAV) descriptors first. If created inside GPU-accessible descriptor heaps (class ID3D12DescriptorHeap, flag D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE), they can be bound to the graphics pipeline, as I described in details in my previously mentioned article. Being part of GPU memory has some implications:

Read full post »

Comments | #directx #rendering Share

# Doing dynamic resolution scaling? Watch out for texture memory size!

Sun
22
Oct 2023

This article is intended for graphics programmers, mostly those who use Direct3D 12 or Vulkan and implement dynamic resolution scaling. Before we go to the main topic, some introduction first…

Nowadays, more and more games offer some kind resolution scaling. It means rendering the 3D scene in a resolution lower than the display resolution and then upscaling it using some advanced shader, often combined with temporal antialiasing and sharpening. It may be one of the solutions provided by GPU vendors (FSR from AMD, XeSS from Intel, DLSS from NVIDIA) or a custom solution (like TSR in Unreal Engine). It is an attractive option for gamers to have a good FPS increase with only minor image quality degradation. It is becoming more important as monitor resolutions increase to 4K or even more, high-end graphics cards are still expensive, and advanced rendering techniques like ray tracing encourage to favor “better pixels” over “more pixels”. See also my old article: “Scaling is everywhere, pixel-perfect is the past”.

Dynamic resolution scaling is an extension to this idea that allows rendering each frame in a different resolution, lower or higher, as a trade-off between quality and performance, to maintain desired framerate even in more complex scenes with many objects, characters, and particle effects visible on the screen. If you are interested in this technique, I strongly recommend checking a recent article from Martin Fuller from Microsoft: “Dynamic Resolution Scaling (DRS) Implementation Best Practice”, which provides many practical implementation tips.

One of the topics we need to handle when implementing dynamic resolution scaling is the creation and usage of textures that need different resolution every frame, especially render target, depth-stencil, and UAV, used temporarily between render passes. One solution could be to create these textures in the maximum resolution and use only part of them when necessary using a limited viewport. However, Martin gives multiple reasons why this option may cause some problems. A simpler and safer solution is to create a separate texture for each possible resolution, with a certain step. In modern graphics APIs (Direct3D 12 and Vulkan) they can be placed in the same memory, which we call memory aliasing.

Here comes the main question I want to answer in this article: What size of the memory heap should we use when allocating memory for these textures? Can we just take maximum dimensions of a texture (e.g. 4K resolution: 3840 x 2160), call device->GetResourceAllocationInfo(), inspect returned D3D12_RESOURCE_ALLOCATION_INFO::SizeInBytes and use it as D3D12_HEAP_DESC::SizeInBytes? A texture with less pixels should always require less memory, right?

WRONG! Direct3D 12 doesn’t define such a requirement and graphics drivers from some GPU vendors really return smaller size required for a texture with larger dimensions, for some specific dimensions and pixel formats. For example, on AMD Radeon RX 7900 XTX, a render target with format DXGI_FORMAT_R16G16B16A16_FLOAT, returns:

For 256x144: 458,752 B
For 270x152: 393,216 B

Why does this happen? It is because textures are not necessarily stored in the GPU memory in a way we imagine them: pixel-after-pixel, row major order. They often use some optimization techniques like pixel swizzling or compression. By “compression”, I don’t mean texture formats like BC or ASTC, which we must use explicitly. I also don’t mean compression like in ZIP file format or zlib/deflate algorithm that decrease data size. Quite the opposite: this kind of compression increases texture size by adding extra metadata, which allow to speed things up by saving memory bandwidth in certain cases. This is done mostly on render target and depth-stencil textures. For more information about it, see my old article: “Texture Compression: What Can It Mean?”. I’m talking about the meaning of the word “compression” number 4 from that article – compression formats that are internal, specific to certain graphics cards, and opaque for us – programmers who just use the graphics API. Problem is that a specific compression format for a texture is selected by the driver based on various heuristics (like render target / depth-stencil / UAV / other flags, pixel format, and… dimensions). This is why a texture with larger dimensions may unexpectedly require less memory.

To research this problem in details, I’ve written a small testing program and I performed tests on graphics cards from various vendors. It was a modification of my small Windows console app D3d12info that goes through the list of all DXGI_FORMAT enum values, calls CheckFeatureSupport to check which ones are supported as a render target or depth-stencil. For those that do, I called GetResourceAllocationInfo to get memory requirements for a texture with this pixel format, with increasing dimensions, where height goes from 32 to 2160 with a step of 8, and width is calculated using a formula for 16:9 aspect ratio: width = height * 16 / 9.

Here are the results. Please remember these are just 3 specific graphics cards. The results may be different on a different GPU and even with a different version of the graphics driver.

On NVIDIA GeForce RTX 3080 with driver 545.84, I found no cases where a texture with larger dimensions requires less memory, so NVIDIA (or at least this specific card) is not affected by the problem described in this article.

On AMD Radeon RX 7900 XTX with driver 23.9.3, I found following data points where memory requirements are non-monotonic – one for each of the following formats:

DXGI_FORMAT_R16G16B16A16_FLOAT/UNORM/UINT/SNORM/SINT: 256x144 = 458,752 B, 270x152 = 393,216 B
DXGI_FORMAT_R32G32_FLOAT/UINT/SINT: 256x144 = 458,752 B, 270x152 = 393,216 B
DXGI_FORMAT_R8G8_UNORM/UINT/SNORM/SINT: 512x288 = 458,752 B, 526x296 = 393,216 B
DXGI_FORMAT_R16_FLOAT/UNORM/UINT/SNORM/SINT: 512x288 = 458,752 B, 526x296 = 393,216 B
DXGI_FORMAT_R8_UNORM/UINT/SNORM/SINT: 256x144 = 131,072 B, 270x152 = 65,536 B
DXGI_FORMAT_A8_UNORM: 256x144 = 131,072 B, 270x152 = 65,536 B
DXGI_FORMAT_B5G6R5_UNORM: 512x288 = 458,752 B, 526x296 = 393,216 B
DXGI_FORMAT_B5G5R5A1_UNORM: 512x288 = 458,752 B, 526x296 = 393,216 B
DXGI_FORMAT_B4G4R4A4_UNORM: 512x288 = 458,752 B, 526x296 = 393,216 B

On Intel Arc A770, with driver 31.0.101.4887, almost every format used as a render target (but none of depth-stencil formats) has multiple steps where the size decreases, and it has them at larger dimensions than AMD. For example, the most “traditional” one – DXGI_FORMAT_R8G8B8A8_UNORM returns:

682x384 = 1,245,184 B, 696x392 = 1,179,648 B
1280x720 = 3,997,696 B, 1294x728 = 3,866,624 B
1664x936 = 6,881,280 B, 1678x944 = 6,553,600 B
1820x1024 = 7,929,856 B, 1834x1032 = 7,864,320 B
1920x1080 = 8,912,896 B, 1934x1088 = 8,519,680 B
2304x1296 = 13,041,664 B, 2318x1304 = 12,320,768 B

What to do with this knowledge? The conclusion is that if we implement dynamic resolution scaling and we want to create textures with different dimensions aliasing in memory, required size of this memory is not necessarily the size of the largest texture in terms of dimensions. To be safe, we should query for memory requirements of all texture sizes we may want to use and calculate their maximum. In practice, it should be enough to query resolutions starting from e.g. 75% of the maximum. Because tested GPUs always have only a single step down, an even more efficient, but not fully future-proof solution could be to start from the full resolution, go down until we find a different memory size (no matter if higher or lower), and take maximum of these two.

So far, I focused only on DirectX 12. Is Vulkan also affected by this problem? In the past, it could be. Vulkan has similar concept of querying for memory requirements of a texture using function vkGetImageMemoryRequirements. It used to have an even bigger problem. To understand it, we must recall that in D3D12, we query for memory requirements (size and alignment) given structure D3D12_RESOURCE_DESC which describes parameters of a texture to be created. In (the initial) Vulkan API, on the other hand, we need to first create the actual VkImage object, and then query for its memory requirements. Question is: Given two textures created with exactly same parameters (width, height, pixel format, number of mip levels, flags, etc.), do they always return the same memory requirements?

In the past, it wasn’t required by the Vulkan specification and I saw some drivers for some GPUs that really returned different sizes for two identical textures! It could cause problems, e.g. when defragmenting video memory in Vulkan Memory Allocator library. Was it a bug, or another internal optimization done by the driver, e.g. to avoid some memory bank conflicts? I don’t know. Good news is that since then, Vulkan specification was clarified to require that functions like vkGetImageMemoryRequirements always return the same size and alignment for images created with the same parameters, and new drivers comply with that, so the problem is gone now. Vulkan 1.3 also got a new function vkGetDeviceImageMemoryRequirements that takes VkImageCreateInfo with image creation parameters instead of an already created image object, just like D3D12 does from the beginning.

Going back to the main question of this article: When VK_KHR_maintenance4 extension is enabled (which has been promoted to core Vulkan 1.3), the problem does not occur, as Vulkan specification says: "For a VkImage, the size memory requirement is never greater than that of another VkImage created with a greater or equal value in each of extent.width, extent.height, and extent.depth; all other creation parameters being identical.", and the same for buffers.

Big thanks to my friends: Bartek Boczula for discussions about this topic and inspiration to write this article, as well as Szymon Nowacki for testing on the Intel card! Also thanks to Constantine Shablia from Collabora for pointing me to the answer on Vulkan.

Comments | #rendering #gpu #vulkan #directx Share

# 3 Ways to Iterate Over std::vector

Sat
30
Sep 2023

This will be a short article about basics of C++. std::vector is a container that dynamically allocates a continuous array of elements. There are multiple ways to write a for loop to iterate over its elements. In 2018 I've written an article "Efficient way of using std::vector" where I compared their performance. I concluded that using iterators can be orders of magnitude slower than using a raw pointer to its data in Debug configuration. This time, I would like to focus on how using "modern" C++ also limits our freedom.

Language purists would probably say that the recommended way to traverse a vector is now a range-based for loop, available since C++11. This is indeed the shortest and the most convenient form, but inside the loop it gives access only to the current element, not its index and not any other elements.

struct Item
{
    int number;
    int otherData[10];
};

std::vector<Item> items = ...

int numberSum = 0;
for(const Item& item : items)
    numberSum += item.number;

Imagine that while traversing the vector, for some elements that are not the first and that meet certain criteria, we want to compare them with their previous element. This is not possible in a range-based for loop above, unless we memorize the previous element in a separate variable and update it on every iteration. Using iterators gives us the possibility to move forward or backward and thus to access the previous element when needed.

for(std::vector<Item>::const_iterator currIt = items.begin(); currIt != items.end(); ++currIt)
{
    if(currIt != items.begin() && // Not the first
        MeetsCriteria(*currIt))
    {
        std::vector<Item>::const_iterator prevIt = currIt;
        --prevIt; // Step back to the previous element
        CompareWithPrevious(*prevIt, *currIt);
    }
}

This is more flexible, but what if we want to insert some elements to the vector while traversing it? There is a trap awaiting here because insert method may invalidate all iterators when underlying array gets reallocated. This is why only iterating using an index is safe here:

for(size_t index = 0; index < items.size(); ++index)
{
    Item newItem;
    if(NeedInsertItemBefore(items[index], &newItem))
    {
        items.insert(items.begin() + index, newItem);
        ++index;
    }
}

Note that pretty much any modern programming language allows to insert and remove elements from a dynamic array using an index, e.g.:

C#: List methods Insert, RemoveAt
Java: ArrayList methods add, remove
JavaScript: array method splice
Python: list methods insert, pop
Even Rust, which also supports iterators, offers Vec methods insert, remove that take index as parameter.

Only C++ requires clumsy syntax with iterators like items.begin() + index.

I know that the code fragments shown above can be written in many other ways, e.g. using auto keyword. If you have an idea for writing any of these loops better way, please leave a comment below and let's discuss.

Comments | #c++ Share

# ShaderCrashingAssert - a New Small Library

Sun
20
Aug 2023

Last Thursday (August 17th) AMD released a new tool for post-mortem analysis of GPU crashes: Radeon GPU Detective. I participated in this project, but because this is my personal blog and because it is weekend now, I am wearing my hobby developer hat and I want to present a small library that I developed yesterday:

ShaderCrashingAssert provides an assert-like macro for HLSL shaders that triggers a GPU memory page fault. Together with RGD, it can help with shader debugging.

Comments | #rendering #directx #productions #libraries #gpu #tools Share

Pages: 1 2 3 ... 147