Entries for tag "commonlib", ordered from most recent. Entry count: 5.
# Check of CommonLib using PVS-Studio
Mon
13
Nov 2017
Courtesy developers of PVS-Studio, I could use this static code analysis tool to check my home projects. I blogged about the tool recently. It's powerful and especially good at finding issues with 32/64-bit portability. Now I analyzed CommonLib - a library I develop and use for many years, which contains various facilities for C++ programming. That allowed me to find and fix many bugs. Here are the most interesting ones I've met:
Comments | #commonlib #c++ #pvs-studio Share
# Ported CommonLib to 64 Bits
Sun
12
Nov 2017
I recently updated my CommonLib library to support 64-bit code. This is a change that I planned for a very long time. I written this code so many years ago that 64-bit compilation on Windows was not popular back then. Today we have opposite situation - a modern Windows application or game could be 64-bit only. I decided to make the library working in both configurations. The way I did it was:
The most important and also most frequent change I needed to make in the code was to use size_t
type for every size (like the number of bytes), index (for indexing arrays), count (number of elements in some collection), length (e.g. number of characters in a string) etc. This is the "native" type intended for that purpose and as such it's 32-bit or 64-bit, depending on configuration. This is also the type returned by sizeof
operator and functions like strlen
. Before making this change, I carelessly mixed size_t
with uint32
.
I sometimes use maximum available value (all ones) as a special value, e.g. to communicate "not found". Before the change I used UINT_MAX
constant for that. Now I use SIZE_MAX
.
I know there is a group of developers who advocate for using signed integers for indices and counts, but I'm among the proponents of unsigned types, just like C++ standard recommends. In those very rare situations where I need a signed size, correct solution is to use ptrdiff_t
type. This is the case where I apply the concept of "stride", which I blogged about here (Polish) and here.
I have a hierarchy of "Stream" classes that represent a stream of bytes that can be read from or written to. Derived classes offer unified interface for a buffer in memory, a file and many other. Before the change, all the offsets and sizes in my streams were 32-bit. I needed to make a decision how to change them. My first idea was to use size_t
. It seems natural choice for MemoryStream
, but it doesn't make sense for FileStream
, as files can always exceed 4 GB size, regardless of 32/64-bit configuration of my code. Such files were not supported correctly by this class. I eventually decided to always use 64-bit types for stream size and cursor position.
There are cases where my usage of size_t
in the library interface can't work down to the implementation, because underlying API supports only 32-bit sizes. That's the case with zlib (a data compression library that I wrapped into stream classes in files ZlibUtils.*), as well as BSTR (string type used by Windows COM that I wrapped into a class in files BstrString.*). In those cases I just put asserts in the code checking whether actual size doesn't exceed 32-bit maximum before downcasting. I know it's not the safe solution - I should report error there (throw an exception), but well... That's only my personal code anyway :)
BstrString::BstrString(const wchar_t *wcs, size_t wcsLen) { (...) assert(wcsLen < (size_t)UINT_MAX); Bstr = SysAllocStringLen(wcs, (UINT)wcsLen);
From other issues with 64-bit compatibility, the only one was following macro:
/// Assert that ALWAYS breaks the program when false (in debugger - hits a breakpoint, without debugger - crashes the program). #define ASSERT_INT3(x) if ((x) == 0) { __asm { int 3 } }
The problem is that Visual Studio offers inline assembler in 32-bit code only. I needed to use __debugbreak
intrinsic function from intrin.h
header instead.
Overall porting of this library went quite fast and smoothly, without any major issues. Now I just need to port the rest of my code to 64 bits.
Comments | #visual studio #c++ #commonlib Share
# LZMA SDK - How to Use
Sat
08
May 2010
What do you think about when I tell a word "compression"? If you currently study computer science, you probably think about some details of algorithms like RLE, Huffman coding or Burrows-Wheeler transform. If not, then you surely associate compression with archive file formats such as ZIP and RAR. But there is something in between - a kind of libraries that let you compress some data - implement a compression algorithm but do not provide ready file format to pack multiple files and directories into one archive. Such library is useful in gamedev for creating VFS (Virtual File System). Probably the most popular one is zlib - a free C library that implements Deflate algorithm. I've recently discovered another one - LZMA. Its SDK is also free (public domain) and the basic version is a small C library (C++, C# and Java API-s are also available, as well as some additional tools). The library uses LZMA algorithm (Lempel–Ziv–Markov chain algorithm, same as in 7z archive format), which has better compression ratio than Deflate AFAIK. So I've decided to start using it. Here is what I've learned:
If you decide to use only the C API, it's enough to add some C and H files to your project - the ones from LZMASDK\C directory (without subdirectories). Alternatively you can compile them as a static library.
There is a little bit of theory behind the LZMA SDK API. First, the term props means a 5-byte header where the library stores some settings. It must be saved with compressed data to be given to the library before decompression.
Next, the dictionary size. It is the size of a temporary memory block used during compression and decompression. Dictionary size can be set during compression and is then saved inside props. Library uses dictionary of same size during decompression. Default dictionary size is 16 MB so IMHO it's worth changing, especially as I haven't noticed any meaninful drop in compression rate when set it to 64 KB.
And finally, end mark can be saved at the end of compressed data. You can use it so decompression code can determine the end of data. Alternatively you can decide not to use the end mark, but you must remember the exact size of uncompressed data somewhere. I prefer the second method, because remembering data size takes only 4 bytes (for the 4 GB limit) and can be useful anyway, while compressed data finished with end mark are about 6 bytes longer than without it.
Compressing full block of data with single call is simple. You can find appropriate functions in LzmaLib.h header. Here is how you can compress a vector of bytes using LzmaCompress function:
Comments | #commonlib #libraries #algorithms Share
# My Algorithms for Sorted Arrays
Sun
21
Mar 2010
Yesterday I wrote about The Beauty of Sorted Arrays. I believe the C++ STL is not perfect and so I've written some additional code that I needed for the purpose of handling such data structure. It can be found in my CommonLib library.
I once wanted to have a sorting algorithm, just like std::sort, but using insertion sort. The std::sort is of course great, has average O(n log n) complexity and AFAIK internally uses introsort algorithm (an improved version of quicksort). But if we know that the collection is already almost sorted, insertion sort will work faster. So here is my code:
// Basic version that uses operator < template<class Iterator> void InsertionSort(Iterator b, Iterator e) { if (b == e) return; for (Iterator j = b + 1; j < e; j++) { typename std::iterator_traits<Iterator>::value_type key; key = *j; Iterator i = j; while (i > b && key < *(i-1)) { *i = *(i-1); i--; } *i = key; } } // Overloaded version with explicit comparator template<class Iterator, typename Compare> void InsertionSort(Iterator b, Iterator e, Compare Comp) { if (b == e) return; for (Iterator j = b + 1; j < e; j++) { typename std::iterator_traits<Iterator>::value_type key; key = *j; Iterator i = j; while (i > b && Comp(key, *(i-1))) { *i = *(i-1); i--; } *i = key; } }
Another problem is finding an item in the ordered array by some key. I can see two solutions in C++, both unsatisfactory. First one is to write your own comparator or overload operator < for your item structure to compare items by key, like this:
struct Item { int Id; DWORD Color; std::string Name; bool operator < (const Item &rhs) const { return Id < rhs.Id; } };
The problem is that C++ comparator always compares two full items, so to use this comparing feature for finding item in the array, we must construct and pass full Item object, despite we use only the Id field.
Item item; item.Id = 1; bool contains = std::binary_search(vec.begin(), vec.end(), item);
Second solution is to use std::map container. It has separate key and value type so you can search by key:
std::map<int, Item>::iterator foundIt = map.find(1); if (foundIt != map.end()) std::cout << "Found ID=" << foundIt->first << ", Name=" << foundIt->second.Name << std::endl;
But the problem is that the key field is separate from value type so the Id field should not be included in the Item structure because it's redundant. We should instead use std::pair<int, Item> as the type of items contained in the map, which is alised to std::map<int, Item>::value_type.
I wanted to have more convenient way to quickly search an ordered array by key, so I've coded these two function templates:
template <typename IterT, typename KeyT, typename CmpT> IterT FirstNotLessIndex(IterT beg, IterT end, const KeyT &key, CmpT cmp) { unsigned down = 0, up = (end - beg); while(down < up) { unsigned mid = (down + up) / 2; int res = cmp(key, *(beg + mid)); if(res > 0) down = mid + 1; else up = mid; } return beg + down; } template <typename IterT, typename KeyT, typename CmpT> IterT BinarySearch(IterT beg, IterT end, const KeyT &key, CmpT cmp) { IterT it = FirstNotLessIndex<IterT, KeyT, CmpT>(beg, end, key, cmp); if (it == end || cmp(key, *it) != 0) return end; return it; }
First one performs binary search to determine the iterator for a first element not less (greater or equal) than the specified key. It's perfect for determining the place for inserting new item into the array. Second function uses first one to check whether an item with exact matching key exists in the array and returns iterator pointing to it, or end iterator if not found.
As you can see, both functions take the type of the key as a template parameter, which may be different than full item type pointed by iterator. They also take a comparator functor which is always called with parameters (key, item) and should return int. The comparator should return a number less, equal or greater than zero according to the result of the comparison - just like strcmp, memcmp and other functions from old C.
Now my example can be written like this:
struct Item { int Id; DWORD Color; std::string Name; static int Cmp(int id, const Item &rhs) { return id - rhs.Id; } }; bool contains = common::BinarySearch(vec.begin(), vec.end(), 1, &Item::Cmp) != vec.end();
Having such universal comparison function, it's easy to convert it to all comparison operators:
bool operator < (const Item &rhs) const { return Cmp(Id, rhs) < 0; } bool operator > (const Item &rhs) const { return Cmp(Id, rhs) > 0; } bool operator <= (const Item &rhs) const { return Cmp(Id, rhs) <= 0; } bool operator >= (const Item &rhs) const { return Cmp(Id, rhs) <= 0; } bool operator == (const Item &rhs) const { return Cmp(Id, rhs) == 0; } bool operator != (const Item &rhs) const { return Cmp(Id, rhs) != 0; }
Opposite conversion is also possible. If you have a type with operator < accessible (like float or your own type with overloaded operator), you can convert it to a comparator returning int with this function:
template <typename T> inline int UniversalCmp(const T &a, const T &b) { if (a < b) return -1; if (b < a) return 1; return 0; }
Comments | #commonlib #stl #c++ #algorithms Share
# Processing File Paths in C++
Tue
12
Jan 2010
Paths to files and directories are special kind of strings. Sometimes they have to be processed, e.g. merged or splitted into some parts like drive letter, directory, file name and extension. Standard C library provides two function for this: makepath and splitpath. They use char* strings and operate always on all possible parts (drive, dir, fname, ext). In my opinion they are not very convenient. On the other hand, modern programming languages have more functionality in this field. For example, .NET has System.IO.Path static class with methods like GetExtension, GetFileName, GetDirectoryName.
To make operations on file paths convenient in C++, I've developed some time ago my own functions for that which operate on std::string strings. I've included them in my CommonLib library. They are similar a bit to these from Delphi :) Here is how I approach this topic: First, a common task is to extract part of a path (e.g. "C:\Windows\notepad.exe"). My functions for this are:
I also have some functions for processing paths:
Path can be absolute or relative. In Windows, absolute paths start with drive letter like "C:\" (which we can recognized by looking for colon) or network share (paths that begin with "\\"). Relative paths (where most basic example is a single file name) make sense only in some context. When we use relative paths to open real files, we address them relative to the "Current Directory" (which is a separate topic, I will write about it another time). To deal with absolute and relative paths, I've coded functions such as:
NormalizePath function preprocessess all the . and .. drectories in the path. A single dot used as directory name means "my own directory" (just does nothing) and two dots mean going up one level to the parent directory. So the path we consider here is logically equal to "C:\Windows\.\system32\..\notepad.exe". Paths like this still work, but they can look a bit unclear, so you can use NormalizePath function to clean them.
Paths in Linux are quite different. My functions also support them, although I didn't test this case in real situations yet. First of all, there are no drive letters in Linux, so absolute paths just start with '/', like "/etc/passwd". Second, a slash sign is used as a separator instead of backslash (it's interesting that slash also works in Windows). It's also more common in Linux than in Windows to meet files that don't have extension or have double extension (like "MyFile.tar.gz"). A question arises here whether what we call "extension" starts from the first or from the second dot :)