Tag: vulkan

Entries for tag "vulkan", ordered from most recent. Entry count: 27.

Pages: 1 2 3 4 >

# Texture Compression: What Can It Mean?

Mar 2020

"Data compression - the process of encoding information using fewer bits than the original representation." That's the definition from Wikipedia. But when we talk about textures (images that we use while rendering 3D graphics), it's not that simple. There are 4 different things we can mean by talking about texture compression, some of them you may not know. In this article, I'd like to give you some basic information about them.

1. Lossless data compression. That's the compression used to shrink binary data in size losing no single bit. We may talk about compression algorithms and libraries that implement them, like popular zlib or LZMA SDK. We may also mean file formats like ZIP or 7Z, which use these algorithms, but also define a way to pack multiple files with their whole directory structure into a single archive file.

Important thing to note here is that we can use this compression for any data. Some file types like text documents or binary executables have to be compressed in a lossless way so that no bits are lost or altered. You can also compress image files this way. Compression ratio depends on the data. The size of the compressed file will be smaller if there are many repeating patterns - the data look pretty boring, like many pixels with the same color. If the data is more varying, each next pixel has even slightly different value, then you may end up with a compressed file as large as original one or even larger. For example, following two images have size 480 x 480. When saved as uncompressed BMP R8G8B8 file, they both take 691,322 bytes. When compressed to a ZIP file, the first one is only 15,993, while the second one is 552,782 bytes.

We can talk about this compression in the context of textures because assets in games are often packed into archives in some custom format which protects the data from modification, speeds up loading, and may also use compression. For example, the new Call of Duty Warzone takes 162 GB of disk space after installation, but it has only 442 files because developers packed the largest data in some archives in files Data/data/data.000, 001 etc., 1 GB each.

2. Lossy compression. These are the algorithms that allow some data loss, but offer higher compression ratios than lossless ones. We use them for specific kinds of data, usually some media - images, sound, and video. For video it's virtually essential, because raw uncompressed data would take enormous space for each second of recording. Algorithms for lossy compression use the knowledge about the structure of the data to remove the information that will be unnoticeable or degrade quality to the lowest degree possible, from the perspective of human perception. We all know them - these are formats like JPEG for images and MP3 for music.

They have their pros and cons. JPEG compresses images in 8x8 blocks using Discrete Fourier Transform (DCT). You can find awesome, in-depth explanation of it on page: Unraveling the JPEG. It's good for natural images, but with text and diagrams it may fail to maintain desired quality. My first example saved as JPEG with Quality = 20% (this is very low, I usually use 90%) takes only 24,753 B, but it looks like this:

GIF is good for such synthetic images, but fails on natural images. I saved my second example as GIF with a color palette of 32 entries. The file is only 90,686 B, but it looks like this (look closer to see dithering used due to a limited number of colors):

Lossy compression is usually accompanied by lossless compression - file formats like JPEG, GIF, MP3, MP4 etc. compress the data losslessly on top of its core algorithm, so there is no point in compressing them again.

3. GPU texture compression. Here comes the interesting part. All formats described so far are designed to optimize data storage and transfer. We need to decompress all the textures packed in ZIP files or saved as JPEG before uploading them to video memory and using for rendering. But there are other types of texture compression formats that can be used by the GPU directly. They are lossy as well, but they work in a different way - they use a fixed number of bytes per block of NxN pixels. Thanks to this, a graphics card can easily pick right block from the memory and uncompress it on the fly, e.g. while sampling the texture. Some of such formats are BC1..7 (which stands for Block Compression) or ASTC (used on mobile platforms). For example, BC7 uses 1 byte per pixel, or 16 bytes per 4x4 block. You can find some overview of these formats here: Understanding BCn Texture Compression Formats.

The only file format I know which supports this compression is DDS, as it allows to store any texture that can be loaded straight to DirectX in various pixel formats, including not only block compressed but also cube, 3D, etc. Most game developers design their own file formats for this purpose anyway, to load them straight into GPU memory with no conversion.

4. Internal GPU texture compression. Pixels of a texture may not be stored in video memory the way you think - row-major order, one pixel after the other, R8G8B8A8 or whatever format you chose. When you create a texture with D3D12_TEXTURE_LAYOUT_UNKNOWN / VK_IMAGE_TILING_OPTIMAL (always do that, except for some very special cases), the GPU is free to use some optimized internal format. This may not be true "compression" by its definition, because it must be lossless, so the memory reserved for the texture will not be smaller. It may even be larger because of the requirement to store additional metadata. (That's why you have to take care of extra VK_IMAGE_ASPECT_METADATA_BIT when working with sparse textures in Vulkan.) The goal of these formats is to speed up access to the texture.

Details of these formats are specific to GPU vendors and may or may not be public. Some ideas of how a GPU could optimize a texture in its memory include:

How to make best use of those internal GPU compression formats if they differ per graphics card vendor and we don't know their details? Just make sure you leave the driver as much optimization opportunities as possible by:

See also article Delta Color Compression Overview at GPUOpen.com.

Summary: As you can see, the term "texture compression" can mean different things, so when talking about anything like this, always make sure to be clear what do you mean unless it's obvious from the context.

Comments | #rendering #vulkan #directx Share

# Vulkan Memory Allocator - budget management

Nov 2019

Querying for memory budget and staying within the budget is a very needed feature of the Vulkan Memory Allocator library. I implemented prototype of it on a separate branch "MemoryBudget".

It also contains documentation of all new symbols and a general chapter "Staying within budget" that describes this topic. Documentation is pregenerated so it can be accessed by just downloading the repository as ZIP, unpacking, and opening file "docs\html\index.html" > chapter “Staying within budget”.

If you are interested, please take a look. Any feedback is welcomed - you can leave your comment below or send me an e-mail. Now is the best time to adjust this feature to users' needs before it gets into the official release of the library.

Long story short:

Update 2019-12-20: This has been merged to master branch and shipped with the latest major release: Vulkan Memory Allocator 2.3.0.

Comments | #vulkan #libraries #productions Share

# Differences in memory management between Direct3D 12 and Vulkan

Jul 2019

Since July 2017 I develop Vulkan Memory Allocator (VMA) – a C++ library that helps with memory management in games and other applications using Vulkan. But because I deal with both Vulkan and DirectX 12 in my everyday work, I think it’s a good idea to compare them.

This is an article about a very specific topic. It may be useful to you if you are a programmer working with both graphics APIs – Direct3D 12 and Vulkan. These two APIs offer a similar set of features and performance. Both are the new generation, explicit, low-level interfaces to the modern graphics hardware (GPUs), so we could compare them back-to-back to show similarities and differences, e.g. in naming things. For example, ID3D12CommandQueue::ExecuteCommandLists function has Vulkan equivalent in form of vkQueueSubmit function. However, this article focuses on just one aspect – memory management, which means the rules and limitation of GPU memory allocation and the creation of resources – images (textures, render targets, depth-stencil surfaces etc.) and buffers (vertex buffers, index buffers, constant/uniform buffers etc.) Chapters below describe pretty much all the aspects of memory management that differ between the two APIs.

Read full article »

Comments | #vulkan #directx #gpu Share

# Vulkan: Long way to access data

Apr 2019

If you want to access some GPU memory in a shader, there are multiple levels of indirection that you need to go through. Understanding them is an important part of learning Vulkan API. Here is an explanation of this whole path.

Let’s take texture sampling as an example. We will start from shader code and go from there back to GPU memory where pixels of the texture are stored. If you write your shaders in GLSL language, you can use texture function to do sampling. You need to provide name of a sampler, as well as texture coordinates.

vec4 sampledColor = texture(sampler1, texCoords);

Earlier in the shader, the sampler needs to be defined. Together with this definition you need to provide index of a slot where this sampler and texture will be bound when the shader executes. Binding resources to slots under different numbers is a concept that exists in various graphics APIs for some time already. In Vulkan there are actually two numbers: index of a descriptor set and index of specific binding in that set. Sampler definition in GLSL may look like this:

layout(set=0, binding=1) uniform sampler2D sampler1;

What you bind to this slot is not the texture itself, but so-called descriptor. Descriptors are grouped into descriptor sets – objects of type VkDescriptorSet. They are allocated out of VkDescriptorPool (which we ignore here for simplicity) and they must comply with some VkDescriptorSetLayout. When defining layout of a descriptor set, you may specify that binding number 1 will contain combined image sampler. (This is just an example way of doing this. There are other possibilities, like descriptors of type: sampled image, sampler, storage image etc.)

VkDescriptorSetLayoutBinding binding1 = {};
binding1.binding = 1;
binding1.descriptorCount = 1;
binding1.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT;

VkDescriptorSetLayoutCreateInfo layoutInfo = {
layoutInfo.bindingCount = 1;
layoutInfo.pBindings = &binding1;

VkDescriptorSetLayout descriptorSetLayout1;
vkCreateDescriptorSetLayout(device, &layoutInfo, nullptr, &descriptorSetLayout1);

VkDescriptorSetAllocateInfo setAllocInfo = {
setAllocInfo.descriptorPool = descriptorPool1; // You need to have that already.
setAllocInfo.descriptorSetCount = 1;
setAllocInfo.pSetLayouts = &descriptorSetLayout1;

VkDescriptorSet descriptorSet1;
vkAllocateDescriptorSets(device, &setAllocInfo, &descriptorSet1);

When you have descriptor set layout created, as well as descriptor set based on it allocated, you need to bind the descriptor set as current one under set index 0 in the command buffer that you fill before you can issue a draw call that will use our shader. Function vkCmdBindDescriptorSets is defined for this purpose:

    0, // firstSet
    1, // descriptorSetCount
    0, // dynamicOffsetCount
    nullptr); // pDynamicOffsets

How do you setup the descriptor to point to a specific texture? There are multiple ways to do that. The most basic one is to use vkUpdateDescriptorSets function:

VkDescriptorImageInfo imageInfo = {};
imageInfo.sampler = sampler1;
imageInfo.imageView = imageView1;

VkWriteDescriptorSet descriptorWrite = {
descriptorWrite.dstSet = descriptorSet1;
descriptorWrite.dstBinding = 1;
descriptorWrite.descriptorCount = 1;
descriptorWrite.pImageInfo = &imageInfo;

    1, // descriptorWriteCount
    &descriptorWrite, // pDescriptorWrites
    0, // descriptorCopyCount
    nullptr); // pDescriptorCopies

Please note that this function doesn’t record a command to any command buffer. Descriptor update happens immediately. That’s why you need to do it before you submit your command buffer for execution on GPU and you need to keep this descriptor set alive and unchanged until the command buffer finishes execution.

There are other ways to update a descriptor set. You can e.g. use last two parameters of vkUpdateDescriptorSets function to copy descriptors (which is not recommended for performance reasons), as well as to use some extensions, e.g.: VK_KHR_push_descriptor, VK_KHR_descriptor_update_template.

What we write as value of the descriptor is reference to objects: imageView1 and sampler1. Let’s ignore the sampler and just focus on imageView1. This is an object of type VkImageView. Like in Direct3D 11, an image view is a simple object that encapsulates reference to an image along with a set of additional parameters that let you “view” the image in a certain way, e.g. limit access to a range of mipmap levels, array layers, or reinterpret it as different format.

VkImageViewCreateInfo viewInfo = {
viewInfo.image = image1;
viewInfo.viewType = VK_IMAGE_VIEW_TYPE_2D;
viewInfo.format = VK_FORMAT_R8G8B8A8_UNORM;
viewInfo.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
viewInfo.subresourceRange.baseMipLevel = 0;
viewInfo.subresourceRange.levelCount = 1;
viewInfo.subresourceRange.baseArrayLayer = 0;
viewInfo.subresourceRange.layerCount = 1;

VkImageView imageView1;
vkCreateImageView(device, &viewInfo, nullptr, &imageView1);

As you can see, image view object holds reference to image1. This is an object of type VkImage that represents actual resource, commonly called “texture” in other APIs. It is created from a rich set of parameters, like width, height, pixel format, number of mipmap levels etc.

VkImageCreateInfo imageInfo = {
imageInfo.imageType = VK_IMAGE_TYPE_2D;
imageInfo.extent.width = 1024;
imageInfo.extent.height = 1024;
imageInfo.depth = 1;
imageInfo.mipLevels = 1;
imageInfo.arrayLayers = 1;
imageInfo.format = VK_FORMAT_R8G8B8A8_UNORM;
imageInfo.tiling = VK_IMAGE_TILING_OPTIMAL;
imageInfo.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
iamgeInfo.usage =
imageInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
imageInfo.samples = VK_SAMPLE_COUNT_1_BIT;

VkImage image1;
vkCreateImage(device, &imageInfo, nullptr, &image1);

It’s not all yet. Unlike previous generation graphics APIs (Direct3D 9 or 11, OpenGL), image or buffer object doesn’t automatically allocate backing memory for its data. You need to do it on your own. What you actually need to do is to first query the image for memory requirements (required size and alignment), then allocate memory block for it and finally bind those two together. Only then the image is usable as a means of accessing the memory, interpreted as colorful pixels of a 2D picture.

VkMemoryRequirements memReq;
vkGetImageMemoryRequirements(device, image1, &memReq);

VkMemoryAllocateInfo allocInfo = {
allocInfo.allocationSize = memReq.size;
allocInfo.memoryTypeIndex = 0; // You need to find appropriate index!

VkDeviceMemory memory1;
vkAllocateMemory(device, &allocInfo, nullptr, &memory1);

    0); // memoryOffset

In production quality code you should of course check for error codes, e.g. when your allocation fails because you run out of memory. You should also avoid allocating separate memory blocks for each of your images and buffers. It is necessary to allocate bigger memory blocks and manage them manually, assigning parts of them to your resources. You can use last parameter of the binding function to provide offset in bytes from the start of a memory block. You can also simplify this part by using existing library: Vulkan Memory Allocator.

Comments | #graphics #vulkan Share

# Vulkan Memory Allocator Survey March 2019

Mar 2019

Are you a software developer, use Vulkan and the Vulkan Memory Allocator library (or at least considered using it)? If so, please spend a few minutes and help to shape the future of the library by participating in the survey:

» Vulkan Memory Allocator Survey March 2019

Your feedback is greatly appreciated. The survey is anonymous - no personal data is collected like name, e-mail etc. All questions are optional.

Comments | #productions #libraries #vulkan Share

# How to design API of a library for Vulkan?

Feb 2019

In my previous blog post yesterday, I shared my thoughts on graphics APIs and libraries. Another problem that brought me to these thoughts is a question: How do you design an API for a library that implements a single algorithm, pass, or graphics effect, using Vulkan or DX12? It may seem trivial at first, like a task that just needs to be designed and implemented, but if you think about it more, it turns out to be a difficult issue. They are few software libraries like this in existence. I don’t mean here a complex library/framework/engine that “horizontally” wraps the entire graphics API and takes it to a higher level, like V-EZ, Nvidia Falcor, or Google Filament. I mean just a small, “vertical”, plug-in library doing one thing, e.g. implementing ambient occlusion effect, efficient texture mipmap down-sampling, rendering UI, or simulating particle physics on the GPU. Such library needs to interact efficiently with the rest of the user’s code to be part of a large program or game. Vulkan Memory Allocator is also not a good example of this, because it only manages memory, implements no render passes, involves no shaders, and it interacts with a command buffer only in its part related to memory defragmentation.

I met this problem at my work. Later I also discussed it in details with my colleague. There are multiple questions to consider:

This is a problem similar to what we have with any C++ libraries. There is no consensus about the implementation of various basic facilities, like strings, containers, asserts, mutexes etc., so every major framework or game engine implements its own. Even something so simple as min/max function is defined is multiple places. It is defined once in <algorithm> header, but some developers don’t use STL. <Windows.h> provides its own, but these are defined as macros, so they break any other, unless you #define NOMINMAX before the include… A typical C++ nightmare. Smaller libraries are better just configurable or define their own everything, like the Vulkan Memory Allocator having its own assert, vector (can be switched to standard STL one), and 3 versions of read-write mutex.

All these issues make it easier for developers to just write a paper, describe their algorithm, possibly share a piece of code, pseudo-code or a shader, rather than provide ready to use library. This is a very bad situation. I hope that over time patterns emerge of how the API of a library implementing a single pass or effect using Vulkan/DX12 should look like. Recently my colleague shared an idea with me that if there was some higher-level API that would implement all these interactions between various parts (like resource allocation, image barriers) and we all commonly agreed on using it, then authoring libraries and stitching them together on top of it would be way easier. That’s another argument for the need of such new, higher-level graphics API.

Comments | #gpu #vulkan #directx #libraries #graphics #c++ Share

# Thoughts on graphics APIs and libraries

Feb 2019

Warning: This is a long rant. I’d like to share my personal thoughts and opinions on graphics APIs like Vulkan, Direct3D 12.

Some time ago I came up with a diagram showing how the graphics software technologies evolved over last decades – see my blog post “Lower-Level Graphics API - What Does It Mean?”. The new graphics APIs (Direct3D 12, Vulkan, Metal) are not only a clean start, so they abandon all the legacy garbage going back to ‘90s (like glVertex), but they also take graphics programming to a new level. It is a lower level – they are more explicit, closer to the hardware, and better match how modern GPUs work. At least that’s the idea. It means simpler, more efficient, and less error-prone drivers. But they don’t make the game or engine programming simpler. Quite the opposite – more responsibilities are now moved to engine developers (e.g. memory management/allocation). Overall, it is commonly considered a good thing though, because the engine has higher-level knowledge of its use cases (e.g. which textures are critically important and which can be unloaded when GPU memory is full), so it can get better performance by doing it properly. All this is hidden in the engines anyway, so developers making their games don’t notice the difference.

Those of you, who – just like me – deal with those low-level graphics APIs in their everyday work, may wonder if these APIs provide the right level of abstraction. I know it will sound controversial, but sometimes I get a feeling they are at the exactly worst possible level – so low they are difficult to learn and use properly, while so high they still hide some implementation details important for getting a good performance. Let’s take image/texture barriers as an example. They were non-existent in previous APIs. Now we have to do them, which is a major pain point when porting old code to a new API. Do too few of them and you get graphical corruptions on some GPUs and not on the others. Do too many and your performance can be worse than it has been on DX11 or OGL. At the same time, they are an abstract concept that still hides multiple things happening under the hood. You can never be sure which barrier will flush some caches, stall the whole graphics pipeline, or convert your texture between internal compression formats on a specific GPU, unless you use some specialized, vendor-specific profiling tool, like Radeon GPU Profiler (RGP).

It’s the same with memory. In DX11 you could just specify intended resource usage (D3D11_USAGE_IMMUTABLE, D3D11_USAGE_DYNAMIC) and the driver chose preferred place for it. In Vulkan you have to query for memory heaps available on the current GPU and explicitly choose the one you decide best for your resource, based on low-level flags like VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT etc. AMD exposes 4 memory types and 3 memory heaps. Nvidia has 11 types and 2 heaps. Intel integrated graphics exposes just 1 heap and 2 types, showing the memory is really unified, while AMD APU, also integrated, has same memory model as the discrete card. If you try to match these to what you know about physically existing video RAM and system RAM, it doesn’t make any sense. You could just pick the first DEVICE_LOCAL memory for the fastest GPU access, but even then, you cannot be sure your resource will stay in video RAM. It may be silently migrated to system RAM without your knowledge and consent (e.g. if you go out of memory), which will degrade performance. What is more, there is no way to query for the amount of free GPU memory in Vulkan, unless you do hacks like using DXGI.

Hardware queues are no better. Vulkan claims to give explicit access to the pieces of GPU hardware, so you need to query for queues that are available. For example, Intel exposes only a single graphics queue. AMD lets you create up to 3 additional compute-only queues and 2 transfer queues. Nvidia has 8 compute queues and 1 transfer queue. Do they all really map to silicon that can work in parallel? I doubt it. So how many of them to use to get the best performance? There is no way to tell by just using Vulkan API. AMD promotes doing compute work in parallel with 3D rendering while Nvidia diplomatically advises to be “conscious” with it.

It's the same with presentation modes. You have to enumerate VkPresentModeKHR-s available on the machine and choose the right one, along with number of images in the swapchain. These don't map intuitively to a typical user-facing setting of V-sync = on/off, as they are intended to be low level. Still you have no control and no way to check whether the driver does "blit" or "flip".

One could say the new APIs don’t deliver to their promise of being low level, explicit, and having predictable performance. It is impossible to deliver, unless the API is specific to one GPU, like there is on consoles. A common API over different GPUs is always high level, things happen under the hood, and there are still fast and slow paths. Isn’t all this complexity just for nothing? It may be true that comparing to previous generation APIs, drivers for the new ones need not launch additional threads in the background or perform shader compilation on first draw call, which greatly reduces chances of major hitching. (We will see how long this state will persist as the APIs and drivers evolve.) * Still there is no way to predict or ensure minimum FPS/maximum frame time. We are talking about systems where multiple processes compete for resources. On modern PCs there is even no way to know how many cycles will a single instruction take! Cache memory, branch prediction, out-of-order execution – all of these mechanisms are there in the CPU to speed up average cases, but there can always be cases when it works slowly (e.g. cache miss). It’s the same with graphics. I think we should abandon the false hope of predictable performance as a thing of the past, just like rendering graphics pixel-perfect. We can optimize for the average, but we cannot ensure the minimum. After all, games are “soft real-time systems”.

Based on that, I am thinking if there is a room for a new graphics API or top of DX12 or Vulkan. I don’t mean whole game engine with physical simulation, handling sound, input controllers and all, like Unity or UE4. I mean an API just like DX11 or OGL, on a similar or higher abstraction level (if higher level, maybe the concept of persistent “frame graph” with explicit pass and resource dependencies is the way to go?). I also don’t think it’s enough to just reimplement any of those old APIs. The new one should take advantage of features of the explicit APIs (like parallel command buffer recording), while hiding the difficult parts (e.g. queues, memory types, descriptors, barriers), so it’s easier to use and harder to misuse. (An existing library similar to this concept is V-EZ from AMD.) I think it may still have good performance. The key thing needed for creation of such library is abandoning the assumption that developer must define everything up-front, with nothing allocated, created, or transferred on first use.

See also next post: "How to design API of a library for Vulkan?"

Update 2019-02-12: I want to thank all of you for the amazing feedback I received after publishing this post, especially on Twitter. Many projects have been mentioned that try to provide an API better than Vulkan or DX12 - e.g. Apple Metal, WebGPU, The Forge by Confetti.

* Update 2019-04-16: Microsoft just announced they are adding background shader optimizations to D3D12, so driver can recompile and optimize shaders in the background on its own threads. Congratulations! We are back at D3D11 :P

Comments | #vulkan #directx #libraries #graphics #optimization #gpu Share

# Vulkan sparse binding - a quick overview

Dec 2018

On 13 December 2018 AMD released new version of graphics driver: 18.12.2. One could say there is nothing special about it – new drivers are released as often as every 2 weeks. But this version brings a change very important for Vulkan developers. AMD now catches up with competition – Nvidia and Intel – in supporting Vulkan sparse binding and sparse residency, which makes it usable in Windows PC games. I’d like to make it an opportunity to briefly describe what is it and how can you start using it, assuming you are a programmer who already knows a bit about C or C++ and Vulkan.

Read full entry > | Comments | #graphics #vulkan Share

Pages: 1 2 3 4 >

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2020