December 2010

Warning! Some information on this page is older than 6 years now. I keep it for reference, but it probably doesn't reflect my current knowledge and beliefs.

# Different Ways of Processing Data

Tue
14
Dec 2010

Much is being said about UX these days, while not so much about the art and science of designing good API so libraries can communicate successfully with their users, that is other programmers. I'm interested in the latter for some time and today I'd like to explore the topic of how some data, whether objects or a raw sequence of bytes, can be processed, loaded and saved. I will clarify what I mean in just a moment. I believe the ways to do it can be grouped into several categories, from simplest but most limited way to the most flexible, efficient but difficult to use.

1. The simplest possible interface for loading (or saving) some data is to pass a string with path to file. That's the way we load DLL libraries in WinAPI. Unfortunately it limits the programmer to load object only from physical files, not from other places like memory pointer, where source data could be placed e.g. after decompression or downloading from network.

HMODULE WINAPI LoadLibrary(__in LPCTSTR lpFileName);

2. Solution to this problem is an API that allows loading data from either file or memory. Some example can be texture loading in D3DX (extension to DirectX), where separate functions are available that take either path to a disk file (LPCTSTR pSrcFile), pointer to a buffer in memory (LPCVOID pSrcData, UINT SrcDataSize) or Windows resource identifier (HMODULE hSrcModule, LPCTSTR pSrcResource).

HRESULT D3DXCreateTextureFromFile(
  __in LPDIRECT3DDEVICE9 pDevice,
  __in LPCTSTR pSrcFile,
  __out LPDIRECT3DTEXTURE9 *ppTexture);
HRESULT D3DXCreateTextureFromFileInMemory(
  __in LPDIRECT3DDEVICE9 pDevice,
  __in LPCVOID pSrcData,
  __in UINT SrcDataSize,
  __out LPDIRECT3DTEXTURE9 *ppTexture);
HRESULT D3DXCreateTextureFromResource(
  __in LPDIRECT3DDEVICE9 pDevice,
  __in HMODULE hSrcModule,
  __in LPCTSTR pSrcResource,
  __out LPDIRECT3DTEXTURE9 *ppTexture);

Another possible approach is to utilize single function and interpret given pointer as either string with file path or a direct memory buffer, depending on some flags. That's the way you can load sound samples in FMOD library:

FMOD_RESULT System::createSound(
  const char * name_or_data,
  FMOD_MODE mode,
  FMOD_CREATESOUNDEXINFO * exinfo,
  FMOD::Sound ** sound
);

Where name_or_data is "Name of the file or URL to open, or a pointer to a preloaded sound memory block if FMOD_OPENMEMORY/FMOD_OPENMEMORY_POINT is used."

3. That's more flexible, but sometimes an object is so big that it's not efficient or even possible to load/uncompress/download its full contents into memory before creating a real resource or do some processing. What's needed is an interface to process smaller chunks of data at time. One of ways to do it is defining an interface with callbacks that the library will call to query for additional piece of data. Then we can implement this interface to read data from any source we wish, whether simple disk file, compressed archive or a network socket. When we want to load an object, we call appropriate function passing pointer to our implementation of the interface. During this call our code is called back and asked to read data. For example, that's the way we can load sounds in Audiere library. The interface for reading data is:

class File : public RefCounted {
public:
  ADR_METHOD(int) read(void* buffer, int size) = 0;
  ADR_METHOD(bool) seek(int position, SeekMode mode) = 0;
  ADR_METHOD(int) tell() = 0;
};

4. A step futher towards more flexibility and generality is the concept of streams, like from Java, C# or Delphi. These object-oriented languages define in their standard libraries an abstract base class for input (for reading data) and output stream (for writing data) that can be implemented in many possible ways. For example, Java's InputStream class defines methods:

void close()
int read()
int read(byte[] b)
int read(byte[] b, int off, int len)
void reset()
long skip(long n)

Many derived classes are provided. Some of them read/write data from real sources like file or network connection, while others process data (for example compress, encrypt) and pass them to another stream. This way a chain of responsibilities can be created where we write data to a stream that compresses them and pass them to another stream, which encrypts them and passes them to the one that does buffering and finally write the data to the stream that saves it to a file. It's my favourite approach right now, although it has a drawback - an overhead for virtual method calls in each stream for each piece of data read or written. This inefficiency can be minimized by controlling granularity - processing a buffer of reasonable size at time, never byte-after-byte.

5. Finally, there is the most direct, low-level approach which is also most flexible and efficient, but at the same time very difficult to use properly. I'm talking about a single function that takes pointers to input and output buffers, as well as some structure containing current state and processes a piece of data. It consumes some/all data from input buffer (by advancing some pointer or counter) and produces new data to the output buffer. There are no callbacks. The interface is neither "push" (where we write data) or "pull" (where we read data), but both at time. That's the way zlib compression library works (which I complained about here, in Polish), as well as LZMA SDK (which I described here).

typedef struct z_stream_s {
  Bytef  *next_in;  /* next input byte */
  uInt   avail_in;  /* number of bytes available at next_in */
  uLong  total_in;  /* total nb of input bytes read so far */

  Bytef  *next_out; /* next output byte should be put there */
  uInt   avail_out; /* remaining free space at next_out */
  uLong  total_out; /* total nb of bytes output so far */

  char   *msg;    /* last error message, NULL if no error */
  struct internal_state FAR *state; /* not visible by applications */
  //...
} z_stream;

ZEXTERN int ZEXPORT deflate OF((z_streamp strm, int flush));

Comments | #software engineering #c++ Share

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2024