# Tag: math

Entries for tag "math", ordered from most recent. Entry count: 58.

Warning! Some information on this page is older than 5 years now. I keep it for reference, but it probably doesn't reflect my current knowledge and beliefs.

Pages: 1 2 3 ... 8

# Operations on power of two numbers

Sun
09
Sep 2018

Numbers that are powers of two (i.e. 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 and so on...) are especially important in programming, due to the way computers work - they operate on binary representation. Sometimes there is a need to ensure that certain number is power of two. For example, it might be important for size and alignment of some memory blocks. This property simplifies operations on such quantities - they can be manipulated using bitwise operations instead of arithmetic ones.

In this post I'd like to present efficient algorithms for 3 common operations on power-of-2 numbers, in C++. I do it just to gather them in one place, because they can be easily found in many other places all around the Internet. These operations can be implemented using other algorithms as well. Most obvious implementation would involve a loop over bits, but that would give O(n) time complexity relative to the number of bits in operand type. Following algorithms use clever bit tricks to be more efficient. They have constant or logarithmic time and they don't use any flow control.

1. Check if a number is a power of two. Examples:

```IsPow2(0)   == true (!!)
IsPow2(1)   == true
IsPow2(2)   == true
IsPow2(3)   == false
IsPow2(4)   == true
IsPow2(123) == false
IsPow2(128) == true
IsPow2(129) == false```

This one I know off the top of my head. The trick here is based on an observation that a number is power of two when its binary representation has exactly one bit set, e.g. 128 = 0b10000000. If you decrement it, all less significant bits become set: 127 = 0b1111111. Bitwise AND checks if these two numbers have no bits set in common.

```template <typename T> bool IsPow2(T x)
{
return (x & (x-1)) == 0;
}```

2. Find smallest power of two greater or equal to given number. Examples:

```NextPow2(0)   == 0
NextPow2(1)   == 1
NextPow2(2)   == 2
NextPow2(3)   == 4
NextPow2(4)   == 4
NextPow2(123) == 128
NextPow2(128) == 128
NextPow2(129) == 256```

This one I had in my library for a long time.

```uint32_t NextPow2(uint32_t v)
{
v--;
v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
v |= v >> 16;
v++;
return v;
}
uint64_t NextPow2(uint64_t v)
{
v--;
v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
v |= v >> 16; v |= v >> 32;
v++;
return v;
}```

3. Find largest power of two less or equal to given number. Examples:

```PrevPow2(0) == 0
PrevPow2(1) == 1
PrevPow2(2) == 2
PrevPow2(3) == 2
PrevPow2(4) == 4
PrevPow2(123) == 64
PrevPow2(128) == 128
PrevPow2(129) == 128```

I needed this one just recently and it took me a while to find it on Google. Finally, I found it in this post on StackOveflow.

```uint32_t PrevPow2(uint32_t v)
{
v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
v |= v >> 16;
v = v ^ (v >> 1);
return v;
}
uint64_t PrevPow2(uint64_t v)
{
v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
v |= v >> 16; v |= v >> 32;
v = v ^ (v >> 1);
return v;
}```

Update 2018-09-10: As I've been notified on Twitter, C++20 is also getting such functions as standard header <bit>.

# Pitfalls of Floating-Point Numbers - Slides

Sat
24
Sep 2016

Just as I announced in my previous blog post, today I gave a lecture on a "Kariera IT" event - organized by CareerCon, dedicated to IT jobs.

Here you can find slides from my presentation, in Polish. It's called "PuĆapki liczb zmiennoprzecinkowych" ("Pitfalls of floating-point numbers").

Here are links to the Floating-Point Formats Cheatsheet (in English) that I mentioned in my presentation:

# Watch out for reduced precision normalize/length in OpenGL ES

Mon
27
Apr 2015

GLSL language for OpenGL ES introduces concept of precision. You can annotate variable declaration (both float and int/uint) with a precision qualifier: `highp`, `mediump` or `lowp`, like:

`mediump float a = 3.0;`

You can also specify default precision qualifier by using precision statement. Language specification defines minimum required range and precision for each precision qualifier.

• `highp` basically means normal, single-precision, 32-bit float (IEEE 754), as we know it from CPU programming.
• `mediump` is said to have have range of at least -2^14 ... 2^14 and relative precision 2^-10, so it can be, for example, implemented using a 16-bit, half-precision float.
• `lowp` is said to have range at least -2 ... 2 and absolute precision 2^-8, so basically it can be stored as a 10-bit, fixed-point number.

GPU vendors are free to use more precise data types, or even full 32-bit float for all of them. What exact precision is used depends on specific GPU and maybe even operating system or graphics driver version. Using smaller data types can occupy less memory, calculate faster and consume less battery power. But it comes at the price of reduced precision and range of these numbers. Tom Olson wrote interesting articles about this: "Benchmarking floating-point precision in mobile GPUs": Part I, Part II, Part III.

In this post I'd like to warn you against a specific problem related to it - usage of `length()`, `normalize()` and `distance()` functions. Using smaller data types not only limits precision in terms of number of significant digits, but also available range (over which the value will saturate to -INF/+INF). For mediump, this range is defined as +/-2^14, which is only 16384.

This may still look like a lot, but let's remember that calculating vector length involves intermediate value that it sum of squares of this components. This can grow very big before a square root is applied. For example, for 3D vector:

`length(a) = sqrt(a.x*a.x + a.y*a.y + a.z*a.z)`

If you do this operation on a mediump vector, the term `a.x*a.x + a.y*a.y + a.z*a.z` can exceed maximum value for vector as small as (74.0, 74.0, 74.0). It can be very dangerous if you do something like this in your fragment shader:

`precision mediump float;uniform vec3 light_pos;...void main(){    ...    vec3 dir_to_light = normalize(pos - light_pos);    // Calculate your lighting and so on...`

You might ask: Why isn't this intermediate value stored in high precision before taking its square root to avoid this overflow problem? Obviously it could be, as precision in any place of the shader is free to be higher than the minimum allowed in that place, so some GPU vendors can do it this way, but you shouldn't rely on this. GLSL specification clearly says that the shader is free to use same, reduced precision for intermediate values.

The precision used to internally evaluate an operation, and the precision qualification subsequently associated with any resulting intermediate values, must be at least as high as the highest precision qualification of the operands consumed by the operation.

Conclusion is: When you write shaders for OpenGL ES, watch out for operations that involve calculating vector length (or dot product) like `length()`, `normalize()`, `distance()`, use `highp` precision for vectors involved and remember that what works on one GPU due to using precision higher than minimum required, may not work on another GPU and it’s still an application issue.

# Floating-Point Formats Cheatsheet

Thu
13
Jun 2013

Floating-point numbers (or floats in short) are not as simple as integer numbers. There is much to be understood when dealing with these numbers on low level - basic things like the sign + exponent + significand representation (and that exponent is biased, while significand has implicit leading 1), why you should never compare calculation results operator ==, that some fractions with finite decimal representation cannot be represented exactly in binary etc., as well as why there are two zeros -0 and +1, what are infinite, NaN (Not a Number) and denorm (denormal numbers) and how they behave. I won't describe it here. It's not an arcane knowledge - you can find many information about this on the Web, starting from Wikipedia article.

But after you understand these concepts, quantitative questions come to mind, like: how many significant decimal digits can we expect from precision of particular float representation (half, single, double)? What is the minimum non-zero value representable in that format? What range of integers can we represent exactly? What is the maximum value? And finally: if our game crashes with "Access violation, reading location 0x3f800000", what chances are that we mistaken pointer for a float number, as this is one of common values, meaning 1.0?

So to organize such knowledge, I created a "Floating-Point Formats" cheatsheet:

# SBX - Scale-Bias Transform

Wed
23
Jan 2013

A class of 2D or 3D point and vector transformations called affine transformations (that is - linear transformation plus translation) can be conveniently represented by matrix. But sometimes we don't need possibility of input x to influence output y or vice versa. For example, that's the case when we want to convert coordinates of mouse cursor from pixels, like we take it from WinAPI (where X goes right from 0 to e.g. 1280 and Y goes down from 0 to e.g. 720) to post-projective 3D space (where X goes right from -1.0 to +1.0 and Y goes up from -1.0 on the bottom to +1.0 on the top). Then every component can be transformed independently using linear transform, like this:

`Parameters are: scale.xy, bias.xy.Given input point i.xy, output point o.xy is:o.x = i.x * scale.x + bias.xo.y = i.y * scale.y + bias.y`

This seems trivial, but what if we wanted to encapsulate such transformation in a new data structure and define operations on it, like we do on matrices? Let's call it SBX - Scale-Bias Transform. I propose following structure:

`struct ScaleBiasXform{    static const ScaleBiasXform IDENTITY;    vec2 Scale;    vec2 Bias;};`

And following functions:

`void Construct(ScaleBiasXform& out,    const vec2& input1, const vec2& output1,    const vec2& input2, const vec2& output2);void Scaling(ScaleBiasXform& out, float scale);void Scaling(ScaleBiasXform& out, const vec2& scale);void Translation(ScaleBiasXform& out, const vec2& vec);void Inverse(ScaleBiasXform& out, const ScaleBiasXform& sbx);void Combine(ScaleBiasXform& out, const ScaleBiasXform& sbx1, const ScaleBiasXform& sbx2);void TransformPoint(vec2& out, const vec2& point, const ScaleBiasXform& sbx);void UntransformPoint(vec2& out, const vec2& point, const ScaleBiasXform& sbx);void TransformVector(vec2& out, const vec2& vec, const ScaleBiasXform& sbx);void UntransformVector(vec2& out, const vec2& vec, const ScaleBiasXform& sbx);`

• vec2 is obviously structure containing: float x, y;
• vec2 has some overloaded operators, including lhs*rhs and lhs/rhs, which perform per-component multiplication or division.
• common::CalcLinearFactors used in CPP file is function from my CommonLib, defined in Base.hpp.

# Particle System - How to Store Particle Age?

Wed
12
Dec 2012

Particle effects are nice because they are simple and look interesting. Besides, coding it is fun. So I code it again :) Particle systems can have state (when parameters of each particle are calculated based on previous values and time step) or stateless (when parameters are always recalculated from scratch using fixed function and current time). My current particle system has the state.

Today a question came to my mind about how to store age of a particle to delete it after some expiration time, determined by the emitter and also unique for each particle. First, let's think for a moment about the operations we need to perform on this data. We need to: 1. increment age by time step 2. check if particle expired and should be deleted.

If that was all, the solution would be simple. It would be enough to store just one number, let's call it TimeLeft. Assigned to the particle life duration at the beginning, it would be:

`Step with dt: TimeLeft = TimeLeft - dtDelete if: TimeLeft <= 0`

But what if we additionally want to determine the progress of the particle lifetime, e.g. to interpolate its color of other parameters depending on it? The progress can be expressed in seconds (or whatever time unit we use) or in percent (0..1). My first idea was to simply store two numbers, expressed in seconds: Age and MaxAge. Age would be initialized to 0 and MaxAge to particle lifetime duration. Then:

`Step with dt: Age = Age + dtDelete if: Age > MaxAgeProgress: AgePercent progress: Age / MaxAge`

Looks OK, but it involves costly division. So I came up with an idea of pre-dividing everything here by MaxAge, thus defining new parameters: AgeNorm = Age / MaxAge (which goes from 0 to 1 during particle lifetime) and AgeIncrement = 1 / MaxAge. Then it gives:

`Step with dt: AgeNorm = AgeNorm + dt * AgeIncrementDelete if: AgeNorm > 1Progress: Age / AgeIncrementPercent progress: AgeNorm`

This needs additional multiplication during time step and division when we want to determine progress in seconds. But as I consider progress in percent more useful than the absolute progress value in seconds, that's my solution of choice for now.

# DirectXMath - A First Look

Sun
15
Apr 2012

Programmers using DirectX have access to the accompanying library full of routines that support various 3D math calculations. First it was D3DX - auxilliary library tightly coupled with DirectX itself. It's still there, but some time ago Microsoft released a new library called XNA Math. I've written about it here. This library also ships with latest versions of DirectX SDK (which was last updated in Jun 2010), but it is something completely new - a more independent, small, elegant and efficient library, portable between Windows (where it can use SSE2 or old FPU) and Xbox 360.

Now Microsoft comes up with another library called DirectXMath. See the official documentation and introductory blog post for details. As a successor of XNA Math, it looks very similar. Main differences are:

• They dropped support for Xbox 360. The library is now portable between Windows (using SSE2 or FPU) and ARM (using NEON).
• It is now a C++ library, makes use of namespaces and templates, so it no longer works with C.
• It makes use of new C++11 standard (and requires compiler that supports it), including integer types from <cstdint> like uint32_t instead of types from <Windows.h> like DWORD.

Sounds great? Well, I didn't tell yet how to get it. It's not shipped with DirectX SDK. You need the SDK for Windows 8. They say that:

The library is annotated using SAL2 (rather than the older VS-style or Windows-style annotations), so requires the standalone Windows SDK for Windows 8 Consumer Preview or some additional fixups to use with Visual C++ 2010.

So probably I'm not going to switch to this new library anytime soon :P

# Ball Physics

Sat
17
Mar 2012

Checking collision between different kinds of 2D or 3D shapes is a subject I deal with for some time. It is useful in game development to determine if some object is visible or affected by light and this if it should be rendered. I have lots of function to check such collisions in the math module of CommonLib library.

But that is only the beginning if you want to make physics. Adding physical behavior to your game requires additional calculations to correct positions and apply forces to colliding bodies. I prepared a small code snippet that implements physical 2D collision between moving circle and another moving circle (calculated by CircleToCircleCollision function), static line (CircleToLineCollision function) and static axis-aligned rectangle (CircleToRectangleCollision function).