# How to Correctly Interpolate Vertex Attributes on a Parallelogram Using Modern GPUs?

Feb 2020

This is probably first guest post ever on my blog. It has been written by my friend Łukasz Izdebski Ph.D.

In a nutshell, today’s graphics cards render meshes using only triangles, as shown on the picture below.

I don’t want to describe how it is done (a lot of information on this topic can be easily found on the Internet) and describe the whole graphics pipeline, but for a short recap: In the first programmable part of the rendering pipeline, vertex shader receives a single vertex (with assigned data about it that I will later refer to) and outputs one vertex which is transformed by the shader. Usually it is a transformation from 3D coordinate system to Normalized Device Coordinates (NDC) (information about NDC can be found in Coordinate Systems). After primitive clipping, perspective divide, and viewport transform, vertices are projected on the 2D screen, which is drawn to the monitor output.

But this is only half the story. I was describing what is happening with vertices, but at the beginning I mentioned triangles, with three vertices forming one triangle. After vertex processing, Primitive Assembly follows, then the next stage is Rasterization. This is very important because in this stage the generation of fragments (pixels) happens, which are lying inside the triangle, as shown on the picture below.

How are colors of the pixels inside the triangle generated? Each vertex can contain not only one set of data - 3D coordinates in the virtual world, but also additional data called attributes. Those attributes can be color of the vertex, normal vector, texture coordinate etc.

How then those rainbow colors are generated as shown on the picture above, while as I said only one color can be set on three vertices of the triangle? The answer is interpolation. As we can read in Wikipedia, interpolation in mathematics is a type of estimation, a method that can be used to generate new data points between a discrete set of known data points. In the described problem it’s about generating interpolated colors inside the rendered triangle.

The way to achieve this is by using barycentric coordinates. (Those coordinates not only can be used for interpolation but additionally define if a fragment lies inside the triangle or not. More on this topic can be read in Rasterization: a Practical Implementation). In short, barycentric coordinate system is a set of three points λ1, λ2, λ3 ≥ 0 (for convex polygons) where λ1 + λ2 + λ3 = 1. When a triangle is rasterized, for every fragment of the triangle proper barycentric coordinates are calculated. Then the color of the fragment C can be calculated by weighted sum of colors at each vertex C1, C2, C3, and weights are of course barycentric coordinates: C = C1 * λ1 + C2 * λ2 + C3 * λ3

This way not only color can be interpolated, but any vertex attribute. (This functionality can be disabled by using proper Interpolation Qualifier on an attribute in vertex shader source code, like flat).

When dealing with triangle meshes, this way of attribute interpolation gives correct results, but when rendering 2D sprites or font glyphs, some disadvantages may occur under specific circumstances. When we want to render a gradient which starts in one of the corners of a sprite (see the picture below), we can see quite ugly results :(

This is happening because interpolation occurs on two triangles independent of each other. Graphics cards can work only on triangles, not quads. In this case we want the interpolation to occur on a quad not triangle, as pictured below:

How to trick the graphics card and force quadrilateral attributes interpolation? One way to render such type of gradients is by using tessellation – subdivide quad geometry. Tessellation shader is available starting from DirectX 11, OpenGL 4.0, and Vulkan 1.0. A simple example of how it will look like depending on different parameters of tessellation (more details about tessellation can be found in Tessellation Stages) can be seen in the animated picture below.

As we can see, when the quad is subdivided to more than 16 pieces, we are getting the desired visual result, but as my high school teacher used to say: “Don’t shoot a fly with a cannon”, and this solution for rendering such a simple thing by using tessellation is an overkill.

This way I developed a new technique that will be helpful to achieve this goal. First, we need to get access to the barycentric coordinates in the fragment shader. DirectX 12 HLSL gives us those coordinates when using SV_Barycentrics. In Vulkan those coordinates are available in AMD VK_AMD_shader_explicit_vertex_parameter extension and Nvidia VK_NV_fragment_shader_barycentric extension. Maybe in the near future it will be available in the core spec of Vulkan and more importantly from all hardware vendors.

If we are not fortunate to have those coordinates as built-in functions, we can generate them by adding some additional data: one new vertex attribute and one new uniform (constant) value. Here are the details of this solution. Consider a quadrilateral built from four vertices and two triangles as shown in the picture below.

Additional attribute Barycentric in vertices is a 2D vector and should contain following values:

A = (1,0) B = (0,0) C = (0,1) D = (0,0)

Next step is to calculate extra constant data for the parameter to be interpolated as shown in the picture above (in this case color attribute), using the equation:

ExtraColorData = - ColorAtVertexA +  ColorAtVertexB - ColorAtVertexC + ColorAtVertexD

The fragment shader that renders the interpolation we are looking for looks like this:

//GLSL Fragment Shader
#version 450
#extension GL_ARB_separate_shader_objects : enable

layout(binding = 0) uniform CONSTANT_BUFFER 
    vec4 ExtraColorData;
} cbuffer;

in block
    vec4 Color;
    vec2 Barycentric;
} PSInput;

layout(location = 0) out vec4 SV_TARGET;

void main()
    SV_TARGET = PSInput.Color + PSInput.Barycentric.x * PSInput.Barycentric.y * cbuffer.ExtraColorData;

//HLSL Pixel Shader
cbuffer CONSTANT_BUFFER : register(b0)
    float4 ExtraColorData;

struct PSInput
    float4 color : COLOR;
    float2 barycentric : TEXCOORD0;

float4 PSMain(PSInput input) : SV_TARGET
    return input.color + input.barycentric.x * input.barycentric.y * ExtraColorData;

That’s all. As we can see, it should not have a big performance overhead. When barycentric coordinates will be more available, then memory overhead will also be minimal.

Probably the reader will ask the question if this is looking good in the 3D perspective scenario, not only when a triangle is parallel to the screen?

As shown in the picture above, hardware properly interpolates data using additional computation as describe here (Rasterization: a Practical Implementation), so this new method works with perspective as it should.

Does this method give proper results only on squares?

This is a good question! The solution described above works on all parallelograms. I’m now working on a solution for all convex quadrilaterals.

What else can we use this method for?

One more usage comes to my mind: a post-process fullscreen quad. As I mentioned earlier, graphics cards do not render quads, but triangles. To simulate proper interpolation of attributes, 3D engines render one BIG triangle which covers the whole screen. With this new approach, rendering quad built from two triangles can be available and attributes which are needed to be quadrilateral interpolated can be calculated in the way shown above.

Comments | #math #rendering Share

# How Do Graphics Cards Execute Vector Instructions?

Jan 2020

Intel announced that together with their new graphics architecture they will provide a new API, called oneAPI, that will allow to program GPU, CPU, and even FPGA in an unified way, and will support SIMD as well as SIMT mode. If you are not sure what does it mean but you want to be prepared for it, read this article. Here I try to explain concepts like SIMD, SIMT, AoS, SoA, and the vector instruction execution on CPU and GPU. I think it may interest to you as a programmer even if you don't write shaders or GPU computations. Also, don't worry if you don't know any assembly language - the examples below are simple and may be understandable to you, anyway. Below I will show three examples:

1. CPU, scalar

Let's say we write a program that operates on a numerical value. The value comes from somewhere and before we pass it for further processing, we want to execute following logic: if it's negative (less than zero), increase it by 1. In C++ it may look like this:

float number = ...;
bool needsIncrease = number < 0.0f;
 number += 1.0f;

If you compile this code in Visual Studio 2019 for 64-bit x86 architecture, you may get following assembly (with comments after semicolon added by me):

00007FF6474C1086 movss  xmm1,dword ptr [number]   ; xmm1 = number
00007FF6474C108C xorps  xmm0,xmm0                 ; xmm0 = 0
00007FF6474C108F comiss xmm0,xmm1                 ; compare xmm0 with xmm1, set flags
00007FF6474C1092 jbe    main+32h (07FF6474C10A2h) ; jump to 07FF6474C10A2 depending on flags
00007FF6474C1094 addss  xmm1,dword ptr [__real@3f800000 (07FF6474C2244h)]  ; xmm1 += 1
00007FF6474C109C movss  dword ptr [number],xmm1   ; number = xmm1
00007FF6474C10A2 ...

There is nothing special here, just normal CPU code. Each instruction operates on a single value.

2. CPU, vector

Some time ago vector instructions were introduced to CPUs. They allow to operate on many values at a time, not just a single one. For example, the CPU vector extension called Streaming SIMD Extensions (SSE) is accessible in Visual C++ using data types like __m128 (which can store 128-bit value representing e.g. 4x 32-bit floating-point numbers) and intrinsic functions like _mm_add_ps (which can add two such variables per-component, outputting a new vector of 4 floats as a result). We call this approach Single Instruction Multiple Data (SIMD), because one instruction operates not on a single numerical value, but on a whole vector of such values in parallel.

Let's say we want to implement following logic: given some vector (x, y, z, w) of 4x 32-bit floating point numbers, if its first component (x) is less than zero, increase the whole vector per-component by (1, 2, 3, 4). In Visual C++ we can implement it like this:

const float constant[] = {1.0f, 2.0f, 3.0f, 4.0f};
__m128 number = ...;
float x; _mm_store_ss(&x, number);
bool needsIncrease = x < 0.0f;
 number = _mm_add_ps(number, _mm_loadu_ps(constant));

Which gives following assembly:

00007FF7318C10CA  comiss xmm0,xmm1  ; compare xmm0 with xmm1, set flags
00007FF7318C10CD  jbe    main+69h (07FF7318C10D9h)  ; jump to 07FF7318C10D9 depending on flags
00007FF7318C10CF  movaps xmm5,xmmword ptr [__xmm@(...) (07FF7318C2250h)]  ; xmm5 = (1, 2, 3, 4)
00007FF7318C10D6  addps  xmm5,xmm1  ; xmm5 = xmm5 + xmm1
00007FF7318C10D9  movaps xmm0,xmm5  ; xmm0 = xmm5

This time xmm registers are used to store not just single numbers, but vectors of 4 floats. A single instruction - addps (as opposed to addss used in the previous example) adds 4 numbers from xmm1 to 4 numbers in xmm5.

It may seem obvious, but it's important for future considerations to note that the condition here and the boolean variable driving it (needsIncrease) is not a vector, but a single value, calculated based on the first component of vector number. Such a single value in the SIMD world is also called a "scalar". Based on it, the condition is true or false and the branch is taken or not, so either the whole vector is increased by (1, 2, 3, 4), or nothing happens. This is how CPUs work, because we execute just one program, with one thread, which has one instruction pointer to execute its instructions sequentially.

3. GPU

Now let's move on from CPU world to the world of a graphic processor (GPU). Those are programmed in different languages. One of them is GLSL, used in OpenGL and Vulkan graphics APIs. In this language there is also a data type that holds 4x 32-bit floating-point numbers, called vec4. You can add a vector to a vector per-component using just '+' operator.

Same logic as in section 2. implemented in GLSL looks like this:

vec4 number = ...;
bool needsIncrease = number.x < 0.0;
 number += vec4(1.0, 2.0, 3.0, 4.0);

When you compile a shader with such code for an AMD GPU, you may see following GPU assembly: (For offline shader compilation I used Radeon GPU Analyzer (RGA) - free tool from AMD.)

v_add_f32      v5, 1.0, v2      ; v5 = v2 + 1
v_add_f32      v1, 2.0, v3      ; v1 = v3 + 2
v_cmp_gt_f32   vcc, 0, v2       ; compare v2 with 0, set flags
v_cndmask_b32  v2, v2, v5, vcc  ; override v2 with v5 depending on flags
v_add_f32      v5, lit(0x40400000), v4  ; v5 = v4 + 3
v_cndmask_b32  v1, v3, v1, vcc  ; override v1 with v3 depending on flags
v_add_f32      v3, 4.0, v0      ; v3 = v0 + 4
v_cndmask_b32  v4, v4, v5, vcc  ; override v4 with v5 depending on flags
v_cndmask_b32  v3, v0, v3, vcc  ; override v3 with v0 depending on flags

You can see something interesting here: Despite high level shader language is vector, the actual GPU assembly operates on individual vector components (x, y, z, w) using separate instructions and stores their values in separate registers like (v2, v3, v4, v0). Does it mean GPUs don't support vector instructions?!

Actually, they do, but differently. First GPUs from decades ago (right after they became programmable with shaders) really operated on those vectors in the way we see them. Nowadays, it's true that what we treat as vector components (x, y, z, w) or color components (R, G, B, A) in the shaders we write, becomes separate values. But GPU instructions are still vector, as denoted by their prefix "v_". The SIMD in GPUs is used to process not a single vertex or pixel, but many of them (e.g. 64) at once. It means that a single register like v2 stores 64x 32-bit numbers and a single instruction like v_add_f32 adds per-component 64 of such numbers - just Xs or Ys or Zs or Ws, one for each pixel calculated in a separate SIMD lane.

Some people call it Structure of Arrays (SoA) as opposed to Array of Structures (AoS). This term comes from an imagination of how the data structure as stored in memory could be defined. If we were to define such data structure in C, the way we see it when programming in GLSL is array of structures:

struct {
  float x, y, z, w;
} number[64];

While the way the GPU actually operates is kind of a transpose of this - a structure of arrays:

struct {
  float x[64], y[64], z[64], w[64];
} number;

It comes with an interesting implication if you consider the condition we do before the addition. Please note that we write our shader as if we calculated just a single vertex or pixel, without even having to know that 64 of them will execute together in a vector manner. It means we have 64 Xs, Ys, Zs, and Ws. The X component of each pixel can be less or not less than 0, meaning that for each SIMD lane the condition may be fulfilled or not. So boolean variable needsIncrease inside the GPU is not a scalar, but also a vector, having 64 individual boolean values - one for each pixel! Each pixel may want to enter the if clause or skip it. That's what we call Single Instruction Multiple Threads (SIMT), and that's how real modern GPUs operate. How is it implemented if some threads want to do if and others want to do else? That's a different story...

Comments | #rendering Share

# Two Shader Compilers of Direct3D 12

Dec 2019

If we write a game or other graphics application using DX12, we also need to write some shaders. We author these in high-level language called HLSL and compile them before passing to the DirectX API while creating pipeline state objects (ID3D12Device::CreateGraphicsPipelineState). There are currently two shader compilers available, both from Microsoft, each outputting different binary format:

  1. old compiler “FXC”
  2. new compiler “DXC”

Which one to choose? The new compiler, called DirectX Shader Compiler, is more modern, based on LLVM/Clang, and open source. We must use it if we want to use Shader Model 6 or above. On the other hand, shaders compiled with it require relatively recent version of Windows and graphics drivers installed, so they won’t work on systems not updated for years.

Shaders can be compiled offline using a command-line program (standalone executable compiler) and then bundled with your program in compiled binary form. That’s probably the best way to go for release version, but for development and debugging purposes it’s easier if we can change shader source just as we change the source of CPU code, easily rebuild or run, or even reload changed shader while the app is running. For this, it’s convenient to integrate shader compiler as part of your program, which is possible through a compiler API.

This gives us 4 different ways of compiling shaders. This article is a quick tutorial for all of them.

1. Old Compiler - Offline

The standalone executable of the old compiler is called “fxc.exe”. You can find it bundled with Windows SDK, which is installed together with Visual Studio. For example, in my system I located it in this path: “c:\Program Files (x86)\Windows Kits\10\bin\10.0.17763.0\x64\fxc.exe”.

To compile a shader from HLSL source to the old binary format, issue a command like this:

fxc.exe /T ps_5_0 /E main PS.hlsl /Fo PS.bin

/T is target profile
ps_5_0 means pixel shader with Shader Model 5.0
/E is the entry point - the name of the main shader function, “main” in my case
PS.hlsl is the text file with shader source
/Fo is binary output file to be written

There are many more command line parameters supported for this tool. You can display help about them by passing /? parameter. Using appropriate parameters you can change optimization level, other compilation settings, provide additional #include directories, #define macros, preview intermediate data (preprocessed source, compiled assembly), or even disassemble existing binary file.

2. Old compiler - API

To use the old compiler as a library in your C++ program:

  • #include <d3dcompiler.h>
  • link with "d3dcompiler.lib"
  • call function D3DCompileFromFile


CComPtr<ID3DBlob> code, errorMsgs;
HRESULT hr = D3DCompileFromFile(
    L"PS.hlsl", // pFileName
    nullptr, // pDefines
    nullptr, // pInclude
    "main", // pEntrypoint
    "PS_5_0", // pTarget
    0, // Flags2
    &code, // ppCode
    &errorMsgs); // ppErrorMsgs
        wprintf(L"Compilation failed with errors:\n%hs\n",
            (const char*)errorMsgs->GetBufferPointer());
    // Handle compilation error...

// (...)
psoDesc.PS.BytecodeLength = code->GetBufferSize();
psoDesc.PS.pShaderBytecode = code->GetBufferPointer();
CComPtr<ID3D12PipelineState> pso;
hr = device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pso));

First parameter is the path to the file that contains HLSL source. If you want to load the source in some other way, there is also a function that takes a buffer in memory: D3DCompile. Second parameter (optional) can specify preprocessor macros to be #define-d during compilation. Third parameter (optional) can point to your own implementation of ID3DInclude interface that would provide additional files requested via #include. Entry point and target platforms is a string just like in command-line compiler. Other options that have their command line parameters (e.g. /Zi, /Od) can be specified as bit flags.

Two objects returned from this function are just buffers of binary data. ID3DBlob is a simple interface that you can query for its size and pointer to its data. In case of a successful compilation, ppCode output parameter returns buffer with compiled shader binary. You should pass its data to ID3D12PipelineState creation. After successful creation, the blob can be Release-d. The second buffer ppErrorMsgs contains a null-terminated string with error messages generated during compilation. It can be useful even if the compilation succeeded, as it then contains warnings.

Update: "d3dcompiler_47.dll" file is needed. Typically some version of it is available on the machine, but generally you still want to redistribute the exact version you're using from the Win10 SDK. Otherwise you could end up compiling with an older or newer version on an end-user's machine.

3. New Compiler - Offline

Using the new compiler in its standalone form is very similar to the old one. The executable is called “dxc.exe” and it’s also bundled with Windows SDK, in the same directory. Documentation of command line syntax mentions parameters starting with "-", but old "/" also seems to work. To compile the same shader using Shader Model 6.0 issue following command, which looks almost the same as for "fxc.exe":

dxc.exe -T ps_6_0 -E main PS.hlsl -Fo PS.bin

Despite using a new binary format (called “DXIL”, based on LLVM IR), you can load it and pass it to D3D12 PSO creation the same way as before. There is a tricky issue though. You need to attach file “dxil.dll” to your program. Otherwise, the PSO creation will fail! You can find this file in Windows SDK path like: “c:\Program Files (x86)\Windows Kits\10\Redist\D3D\x64\dxil.dll”. Just copy it to the directory with target EXE of your project or the one that you use as working directory.

4. New Compiler - API

The new compiler can also be used programatically as a library, but its usage is a bit more difficult. Just as with any C++ library, start with:

  • #include <dxcapi.h>
  • link "dxcompiler.lib"
  • create and use object of type IDxcCompiler

This time though you need to bundle additional DLL to your program (next to “dxil.dll” mentioned above): “dxcompiler.dll”, to be found in the same “Redist\D3D\x64” directory. There is more code needed to perform the compilation. First create IDxcLibrary and IDxcCompiler objects. They can stay alive for the whole lifetime of your application or as long as you need to compile more shaders. Then for each shader, load it from a file (or any source of your choice) to a blob, call Compile method, and inspect its result, whether it’s an error + a blob with error messages, or a success + a blob with compiled shader binary.

CComPtr<IDxcLibrary> library;
HRESULT hr = DxcCreateInstance(CLSID_DxcLibrary, IID_PPV_ARGS(&library));
//if(FAILED(hr)) Handle error...

CComPtr<IDxcCompiler> compiler;
hr = DxcCreateInstance(CLSID_DxcCompiler, IID_PPV_ARGS(&compiler));
//if(FAILED(hr)) Handle error...

uint32_t codePage = CP_UTF8;
CComPtr<IDxcBlobEncoding> sourceBlob;
hr = library->CreateBlobFromFile(L"PS.hlsl", &codePage, &sourceBlob);
//if(FAILED(hr)) Handle file loading error...

CComPtr<IDxcOperationResult> result;
hr = compiler->Compile(
    sourceBlob, // pSource
    L"PS.hlsl", // pSourceName
    L"main", // pEntryPoint
    L"PS_6_0", // pTargetProfile
    NULL, 0, // pArguments, argCount
    NULL, 0, // pDefines, defineCount
    NULL, // pIncludeHandler
    &result); // ppResult
        CComPtr<IDxcBlobEncoding> errorsBlob;
        hr = result->GetErrorBuffer(&errorsBlob);
        if(SUCCEEDED(hr) && errorsBlob)
            wprintf(L"Compilation failed with errors:\n%hs\n",
                (const char*)errorsBlob->GetBufferPointer());
    // Handle compilation error...
CComPtr<IDxcBlob> code;

// (...)
psoDesc.PS.BytecodeLength = code->GetBufferSize();
psoDesc.PS.pShaderBytecode = code->GetBufferPointer();
CComPtr<ID3D12PipelineState> pso;
hr = device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pso));

Compilation function also takes strings with entry point and target profile, but in Unicode format this time. The way to pass additional flags also changed. Instead of using bit flags, parameter pArguments and argCount take an array of strings that can specify additional parameters same as you would pass to the command-line compiler, e.g. L"-Zi" to attach debug information or L"-Od" to disable optimizations.

Update 2020-01-05: Thanks @MyNameIsMJP for your feedback!

Comments | #rendering #directx Share

# Xiaomi Smart Band - a Very Good Purcharse

Dec 2019

Despite being a programmer, I'm quite conservative with technological novelties. For example, I never owned a tablet and never felt a need to have one. I was also aware there are smart watches on the market, but an idea of spending 300 EUR for another device that I would then need to charge every day seemed like too much for me. Neither I was interested in those smart bands that claim to monitor your pulse, sleep, and count your steps.

What made me revisit those types of devices was a real need I felt many times recently. Sometimes I was attending a conference, sitting on a talk with my smartphone switched to quiet mode. I was repeatedly pulling it out to check the time or to see if anyone tried to call me or sent me a message. Those are the situations where it's important not to miss the talk I want to see and to catch up with my friends, while having my phone ringing would be undesirable. Another time I was on a party in a club or concert where the music was so loud I couldn't hear or feel my phone ringing, while that's also the time when I repeatedly check the clock not to miss the show of my favourite DJ while trying to hang out with my friends. Then I thought: maybe a smart watch could provide those two simple things: show the current time and display notifications from text messages, Messenger etc., notifying about them using vibration?

After some research I found out that smart bands do exactly that if I disregard all sport-related features. I chose Xiaomi Mi Smart Band 4, but devices from other manufacturers would probably provide the similar experience. It surprised me it costs only 140 PLN (33 EUR). After charging the battery, installing special "Mi Fit" app on my Android phone and pairing the two devices using Bluetooth, I could configure my smart band to change wallpaper etc. (BTW the default ones are terrible, but their format has been reverse engineered, so there are many user-created watchfaces available to download on the Internet.)

This device has only 512 KB of RAM and 120x240 pixel screen, but even with those parameters looking like some Atari computer from 30 years ago it provides many useful features. First and foremost, it shows current time and date - for 5 seconds after activated using touch screen. It also vibrates when there is an incoming call or a new message. You can be configur which of the apps installed on the smartphone display their notifications also on the band, but these can be basically the same as the notifications on the phone, so any messaging app will work - whether Messenger, WhatsApp, Signal, Tinder, or standard text messages. Sender and text can be seen on the band, but responding requires pulling out the phone. Additional features I like are showing current weather and the weather forecast for the following days for current location, timer, and alarms, which can wake you up using vibration - useful not to wake up other people sleeping in the same room. The biggest surprise for me was battery life. 20 days declared by the manufacturers seemed unrealistic, but after I charged it for the first time on November 20th, it worked until... yesterday, which gives 29 days. One disadvantage I can see is that now I must have Bluetooth enabled on my phone all the time, which drains its battery faster, but I charge the phone every day anyway.

This article is not sponsored. I wrote it from my initiative, just to share my experiences with this type of device. If you consider it useful to have current time and incoming messages available on your wristband without a need to pull out your phone, a smart band like Xiaomi Mi Smart Band 4 is a good choice.

Comments | #hardware #shopping Share

# Vulkan Memory Allocator - budget management

Nov 2019

Querying for memory budget and staying within the budget is a very needed feature of the Vulkan Memory Allocator library. I implemented prototype of it on a separate branch "MemoryBudget".

It also contains documentation of all new symbols and a general chapter "Staying within budget" that describes this topic. Documentation is pregenerated so it can be accessed by just downloading the repository as ZIP, unpacking, and opening file "docs\html\index.html" > chapter “Staying within budget”.

If you are interested, please take a look. Any feedback is welcomed - you can leave your comment below or send me an e-mail. Now is the best time to adjust this feature to users' needs before it gets into the official release of the library.

Long story short:

  • A function is added to query for current memory usage and available budget per Vulkan memory heap.
  • If you enable extension VK_EXT_memory_budget and tell VMA about it, the extension is used for that query. If not, current usage and budget is estimated based on total size of currently allocated blocks made and 80% of heap sizes, respectively.
  • If you are close to exceeding the budget or it is already exceeded, the library doesn’t allocate another default 256 MB memory block. It instead tries to allocate smaller block or even dedicated allocation just for your resource, to stay withing the budget.
  • It still tries to make the allocation and leaves to Vulkan the decision whether the allocation succeeds or fails, unless you use new VMA_ALLOCATION_CREATE_WITHIN_BUDGET_BIT, which causes the allocation to just return failure if it would go over budget.

Update 2019-12-20: This has been merged to master branch and shipped with the latest major release: Vulkan Memory Allocator 2.3.0.

Comments | #vulkan #libraries #productions Share

# Further improvements on my website

Oct 2019

Have you noticed any changes on my website? Probably not - and that’s the whole point. I’ve made few improvements on the technical side of it, but it’s still working as usual. Here is a brief story of the development of my home page...

I was never a passionate web developer, but I learned a bit of some languages and technologies needed to make a web page. When I started this one in 2004, the word “blog” was already in use, but there was no “cloud”, no Node.js or Ruby on Rails. I could either buy a hosting account with PHP scripting and MySQL database on the back end, or a Linux shell account with full SSH access, which would be much more expensive. Surely I chose the first option. Besides that, there was HTML 4.01 and CSS 1 on the client’s side.

Over time, I introduced gradual improvements to my home page, including:

  • Started blogging in English instead of Polish (since June 2009).
  • Installed Google Custom Search for searching within this page (see text box in the top-right corner).
  • Added Atom feed.
  • Used mod_rewrite to support nice looking URLs like “/news_1657_title_of_my_post” instead of original ones like “/news.php5?action=view&id=1657”
  • Registered in Google Search Console, added sitemap.
  • Switched from old-fashioned layout based on HTML <table>s to more modern based on <div>s and CSS formatting. Started using HTML5. (See Changes on My Website.)
  • Installed external service Disqus for comments instead of my script.
  • Changed the CSS style sheet according to the idea of “responsive design” to make the site friendly to mobile devices, like smartphones.

For some time I thought maybe I should rewrite this whole website from scratch. Then there would be a difficult question to answer: What technology to use? I don’t know web technologies well, but I know there are many of them. I could just install WordPress or some other blogging system and somehow move all the existing content there. I could rewrite all the scripts using more modern PHP 7 or a more trendy language, like Ruby, server-side JavaScript. I could even make it all static HTML content, which would be enough for the things I have here. Then I could use some offline tool to generate those pages, or write my own. I could also use Amazon S3 to host those pages. The possibilities are endless...

Then I recalled the rule that “if it ain’t broke, don’t fix it” and thought the hosting service I now use at company is quite good with a low price for WWW + PHP + MySQL + FTP + e-mail account. I decided eventually just to improve the existing solution. Here is what I’ve changed recently:

  • Added support for HTTPS, with help of my hosting company, who generated an SSL certificate for my domains. It’s not that important for a static website intended just for reading, but argues that everyone should use it and modern web browsers warn about “connection not secure” when using HTTP, so it was worth doing. The official address of my home page is now!
  • Converted static pages and database content to UTF-8 character encoding. Until this week the page was still using ISO-8859-2 (latin2) codepage. Again, this doesn’t make much difference on a page using only English and Polish characters, but argues that everyone should use UTF-8, so I wanted to be up-to-date with the latest trends :)

If you have any suggestions about my website, whether its looks or technical details, please leave a comment.

Comments | #homepage #webdev Share

# Book review: C++17 in Detail

Oct 2019

Courtesy its author Bartłomiej Filipek, I was given an opportunity to read a new book "C++17 in Detail". Here is my review:

When I am about to read or decide whether to buy a book, I first look at two things. These are not the looks of the cover or the description on the back. Instead, I check the table of contents and the number of pages. It gives me a good overview of the topics covered and the estimation of chances they are sufficiently covered. "C++17 in Detail" with its 361 pages looks good as for a book describing what's new in C++17 standard, considering the additions to the standard are not as extensive as they were in C++11. The author is undoubtedly an expert in this field, as seen from entries on his Bartek's coding blog.

Author claims to describe all the significant additions to the language. However, this is not a dull, difficult to read documentation of the new language elements, like you can find on Instead, the book describes each of them by giving some background and rationale, and showing real-life examples. It makes them easy to understand and to appreciate their usefulness. Each addition to the standard is also accompanied with a reference to the official documents by C++ standard committee and a table showing which versions of the most popular C++ compilers (GCC, Clang, Microsoft Visual C++) support it. Spoiler: They already support almost all of them :)

The book doesn't teach everything from scratch. That would be impossible in that number of pages, considering how big and complex C++ is. It assumes the reader already knows the language quite well, including some features from C++11 like unique_ptr or r-value reference + move semantics. It explains however few topics needed for the book in more details, like the concept of "reduce" and "scan" parallel algorithms, which C++17 adds to the standard library.

The contents of the book is grouped into 3 parts. Part 1 describes additions to the C++ language itself, including init statement for if and switch (e.g. if(int i = Calculate(); i > 0) ...), additions to templates like if constexpr, and attributes like [[nodiscard]], [[maybe_unused]]. Part 2 describes what has been added to its standard library, including std::optional, variant, any, string_view, filesystem. Finally, part 3 shows more extensive code examples that combine multiple new C++ features to refactor existing code into more clean and more efficient one. The author also mentions what parts of the language have been deprecated or removed in the new standard (like auto_ptr).

To summarize, I recommend this book to any C++ developer. It's a good one, and it lets you stay up-to-date with the language standard. You will learn all new features of the language and its standard library from it in a more pleasing way than by reading documents from the C++ committee. Even if you won't be able to use these new features in your current project because your old compiler not upgraded for many years or the coding standard imposed by your team lead doesn't let you, I think it's worth learning those things. Who knows if you won't be asked about them on your next job interview?

You can buy printed version of the book on and electronic version on Leanpub. Bartek, the author of the book, also agreed to give all of the readers of my blog a nice discount - 30%. It's valid till the end of October, and to use it just visit this link.

Comments | #C++ #books Share

# Weirdest rules from coding standards

Sep 2019

Earlier this month I asked on Twitter "what is the weirdest and the most stupid rule you had to follow because of the "Coding Standard"?" I've got some interesting responses. Thinking about it more, I concluded that coding standards are complex. Having one in your project is a good thing because it imposes a consistent style, which is a value by itself. But specific rules are of various types. Some carry universally recognized good practices, like "use std::unique_ptr, don't use std::auto_ptr". Some serve good code performance, like "pass large structures as const& parameters, not by value". Others are purely a matter of subjective preference of its authors, e.g. to use CamelCaseIdentifiers rather than snake_case_identifiers or spaces instead of tabs for indentation. Even the division between those categories is no clear though. For example, there is a research showing that Developers Who Use Spaces Make More Money Than Those Who Use Tabs.

But some rules are simply ridiculous and hard to explain in a rational way. Here are two examples from my experience:

Number 2: Lengthy ASCII-art comment required before every function. In that project we couldn't write an inline function even for the simplest getters, like:

class Element
    int identifier;
    int GetIdentifier() { return identifier; } // Illegal!

We had to only declare member functions in the header file, while definition had to contain a specific comment that repeats the name of the function (which is a nonsense and a bad practice by itself, as it introduces duplication and may go out of sync with actual code), and its description (even if the name is self-descriptive), description of all its parameters (even if their names are self-descriptive), return value etc. Example:



    Returns identifier.


    Identifier of the current element.

int Element::GetIdentifier()
    return identifier;

I like comments. I believe they are useful to explain and augment information carried by function and variable names, especially when they document valid usage of a library interface. For example, a comment may say that a pointer can be null and what that means, a uint may have special value UINT32_MAX and what happens then, or that a float is expressed in seconds. But the comment as shown above doesn't add any useful information. It's just more symbols to type, developer's time wasted, makes code bloated and less readable. It's not even in any standard format that could automatically generate documentation, like with Doxygen. It's just custom, arbitrary rule.

What was the reason behind this rule? A colleague once told me that many years ago the architect of this whole program hoped that they would develop a tool to parse all this code and those comments and generate documentation. Decades have passed, and it didn't happen, but developers still had to write those comments.

The effect was that everyone avoided adding new functions as much as possible or splitting their code into small functions. They were just adding more and more code to the existing ones, which could grow to hundreds of lines. That also caused one of the most obscure bugs I've met in my life (bug number 2).

Number 1: Don't use logical negation operator '!', like: if(!isEnabled). Always compare with false instead, like: if(isEnabled == false).

I understand a requirement to always compare pointers and numbers to some value like nullptr instead of treating them as booleans, although I don't like it. But banning one of the fundamental operators, also when used with bool variables, is hard to justify for me.

Why would anyone come up with something like this? Is it because a single '!' symbol is easy to omit when writing or reading and not so explicit as == false? Then, if the author of this rule suffers from bad sight or dyslexia and a single dash here and there doesn't make a difference to him, maybe he should also define functions Add(), Subtract(), and tell developers to use them instead of operators '+' and '-', because they too are so easy to confuse? Or maybe not... He should rather go do something other than programming :)

Comments | #software engineering #c++ Share

Older entries >


Pinboard Bookmarks


Blog Tags

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2020