Music Analysis - Spectrogram

Warning! Some information on this page is older than 3 years now. I keep it for reference, but it probably doesn't reflect my current knowledge and beliefs.

Jul 2010

Music Analysis - Spectrogram

I've started learning about sound analysis. I have some deficiencies in education when it comes to digital signal processing (greetings for the professor who taught this subject at our university ;) but Wikipedia comes to the rescue. As a starting point, here is a spectrogram I've made from one of my recent favourite songs: Sarge Devant feat. Emma Hewitt - Take Me With You.

Now I'm going to exaplain in details how I've done this by showing some C++ code. First I had to figure out how to decode an MP3, OGG or other compressed sound format. FMOD is my favourite sound library and I knew it can play many file formats. It took me some time though to find functions for fast decoding uncompressed PCM data from a song without actually playing it for all 3 minutes. I've found on the FMOD forum that Sound::seekData and Sound::readData can do the job. Finally I've finished with this code (all code shown here is stripped from error checking which I actually do everywhere):

#include <fmod.hpp>
#pragma comment(lib, "fmodex_vc.lib")

// Initialize FMOD.
// Set output device type to dummy/null.
FMOD::System *g_System;
g_System->init(100, FMOD_INIT_NORMAL, 0);

// Open the music file as stream, software mode.
FMOD::Sound *sound;
g_System->createStream("C:\\...mp3", FMOD_SOFTWARE, 0, &sound);

// Fetch song parameters
/* In my case it is:
freq=44100, Format=FMOD_SOUND_FORMAT_PCM16, Channels=2, Bits=16
Length=221570 ms, PCM=9771264, Bytes=39085056 */
float freq; FMOD_SOUND_FORMAT format; int channels, bits;
uint lenMs, lenPcm, lenPcmBytes;
sound->getDefaults(&freq, 0, 0, 0);
sound->getFormat(0, &format, &channels, &bits);
sound->getLength(&lenMs,       FMOD_TIMEUNIT_MS);
sound->getLength(&lenPcm,      FMOD_TIMEUNIT_PCM);
sound->getLength(&lenPcmBytes, FMOD_TIMEUNIT_PCMBYTES);
StartMusic(freq, format, channels, bits, lenMs, lenPcm, lenPcmBytes);

// Decode song data
static const uint BUF_SIZE = 0x10000;
uint8 buf[BUF_SIZE];
bool processing = true;
  uint bytesRead;
  res = sound->readData(buf, BUF_SIZE, &bytesRead);
  if (res == FMOD_ERR_FILE_EOF)
    processing = false;
  ProcessMusicBuf(buf, bytesRead);
while (processing);


Bytes entering ProcessMusicBuf function (always 0x10000 = 65536 bytes except the last call) are PCM samples arranged like this: each sample is a little-endian signed short number for left channel followed by similar a number for right channel. So each sample has 4 bytes. There are 44100 samples per second (this is the sampling frequency in Hz).

To make a spectrum, one need to transform data from time domain to freqency domain, that is to perform the Fourier Transform. I used FFTW library for this. Here is how I do it:

#include <fftw3.h>
#pragma comment(lib, "libfftw3f-3.lib")

static const uint FFT_INPUT_SAMPLES = 4096;
static const uint FFT_OUTPUT_SAMPLES = (FFT_INPUT_SAMPLES / 2) + 1;
static float g_InBuf[FFT_INPUT_SAMPLES], g_WindowedBuf[FFT_INPUT_SAMPLES];
static uint g_InBufPos = FFT_INPUT_SAMPLES/2;
static fftwf_complex g_OutBuf[FFT_OUTPUT_SAMPLES];
static fftwf_plan g_Plan;

// Initialization
g_Plan = fftwf_plan_dft_r2c_1d(FFT_INPUT_SAMPLES, g_WindowedBuf, g_OutBuf, FFTW_ESTIMATE);

for (uint i = 0; i < FFT_INPUT_SAMPLES/2; ++i)
  g_InBuf[i] = 0.f;

// Finalization

// Processing function
void ProcessMusicBuf(const void *buf, uint bytes)
  const uint8 *bufBytes = (const uint8*)buf;
  uint samples = bytes / 4;
  while (samples)
    uint samplesToCopy = std::min(samples, FFT_INPUT_SAMPLES - g_InBufPos);
    PostSamples(g_InBuf + g_InBufPos, bufBytes, samplesToCopy);
    g_InBufPos += samplesToCopy;
    if (g_InBufPos == FFT_INPUT_SAMPLES)
      memcpy(g_InBuf, g_InBuf + FFT_INPUT_SAMPLES/2, sizeof(float) * FFT_INPUT_SAMPLES / 2);
      g_InBufPos = FFT_INPUT_SAMPLES/2;

    samples -= samplesToCopy;
    bufBytes += samplesToCopy * 4;

// Helper functions

void PostSamples(float *outSamples, const uint8 *inPcmData, uint sampleCount)
  for (uint i = 0; i < sampleCount; ++i, inPcmData += 4, ++outSamples)
    short lSample = *(const short*)inPcmData;
    short rSample = *(const short*)(inPcmData + 2);
    *outSamples = ((float)lSample + (float)rSample) / 65536.f;

static void PostInBuf()
  for (uint i = 0; i < FFT_INPUT_SAMPLES; ++i)
    g_WindowedBuf[i] = g_InBuf[i];
    // Hamming window
    g_WindowedBuf[i] *= 0.54f - 0.46f * cosf( (PI * 2.f * i) / (FFT_INPUT_SAMPLES - 1) );


  // I rewrite to g_OutBuf[i][0] squared absolute value of a complex number g_OutBuf[i].
  for (uint i = 0; i < FFT_OUTPUT_SAMPLES; ++i)
    g_OutBuf[i][0] = g_OutBuf[i][0]*g_OutBuf[i][0] + g_OutBuf[i][1]*g_OutBuf[i][1];
  // see below...

Here I have a window for 4096 samples sliding along the data. PCM values (2 signed shorts) are averaged and rescaled to -1..1 for the g_InBuf by PostSamples function. g_InBuf is filled with zeros to the half at the beginning and then each time 2048 new samples is read, the window is advanced bo 2048 samples. So each time I process the buffer with PostInBuf, I have 2048 new samples on the right and 2048 old samples on the left. In other words, windows overlap by a half. I process buffer every 2048 samples which gives 0.04644 s.

I process g_InBuf into g_WindowedBuf using Hamming window. There exist many Window functions with different properties and my choice so far is rather arbitrary :)

Next I execute the FFT transform. 4096 real numbers on input processed by FFTW dft_r2c_1d transform gives 4096/2+1 = 2049 complex numbers at output, which I suppose can be interpreted as frequencies from 0 to 22050 Hz with 10.7666 Hz step. I don't know yet how could I use the phase of these complex numbers so I only measure squared absolute values (real*real + imag*imag), as they probably give the amplitude, energy or whatever at particular frequency and in particular point in time.

Finally to visualize my spectrum I use FreeImage library. It can allocate, fill and then save a bitmap in many different image file formats (like BMP, PNG, JPEG). Here is the code:

#include <FreeImage.h>
#pragma comment(lib, "FreeImage.lib")

static const uint imgSizeX = 4759;
static const uint imgSizeY = 256;
static FIBITMAP *bitmap;
static uint g_ImageX;

// Initialization
bitmap = FreeImage_Allocate((int)imgSizeX, (int)imgSizeY, 24);

// Finalization
FreeImage_Save(FIF_BMP, bitmap, "G:\\tmp\\Spectrogram.bmp");

void PostInBuf()
  // see above...
  if (g_ImageX < imgSizeX)
    uint samplesPerPixel = FFT_OUTPUT_SAMPLES / imgSizeY;
    float val;
    RGBQUAD pixel;
    for (uint y = 0, sample = 0; y < imgSizeY; ++y)
      // Average values over samplesPerPixel numbers.
      val = 0.f;
      for (uint localSample = 0; localSample < samplesPerPixel; ++localSample, ++sample)
        val += g_OutBuf[sample][0];
      val /= (float)samplesPerPixel;
      val = log10f(val + 1.f); // Logarithm.
      val *= 0.4f / 3.f; // Manually selected scaling.
      val = common::minmax(0.f, val, 1.f); // Clamp.
      val = powf(val, 1.f/2.2f); // Gamma (a computer graphics thing :)
      pixel.rgbRed = pixel.rgbGreen = pixel.rgbBlue = (uint8)(val * 255.f + 0.5f);
      FreeImage_SetPixelColor(bitmap, g_ImageX, y, &pixel);


I know these algorithms could be optimized, parallelized and so on, but now I'm more concerned about what to do with this further. What information can be extracted from music to be used for generating gameplay like in AudioSurf? What are good articles about this on the Internet?

The spectrogram I made already reveals some regular features, so I think it could be now processed as image :) But 2D image recognition and processing is another broad topic...

Another of my questions for those who managed to read up to this point is: wouldn't be faster and more convenient to do such math research in Scilab instead of C++? Is it possible to do such things in Scilab anyway?

Comments (5) | Tags: math libraries music rendering dsp algorithms | Author: Adam Sawicki | Share


2013-05-30 11:31:11
Nike air is one of the most popular brands of shoes ever released. The brand was created by Nike in 1985 for arguably the [url=]nike blazer[/url] best basketball player of all time, Micheal Jordan.
how to get ex back
2013-06-30 09:10:16
The spectrogram I made already reveals some regular features, so I think it could be now processed as image :) But 2D image recognition and processing is another broad topic...
<a href="https//">The spectrogram I made already reveals some regular features, so I think it could be now processed as image :) But 2D image recognition and processing is another broad topic...</a>
Coach outlet store online
2013-11-27 09:43:46
When it comes to Coach outlet store online, just remember the following rules:
1. Always be Coach outlet yourself.
2. Make the extra effort to look polished, put-together, and still like Coach factory outlet you.

Translation: We know you wouldn't show up in sweats (even the coach handbags new 2013), but it's not the best time to rock your ripped-up boyfriend jeans and crop top either. So, what should you wear? Here are 10 outfit ideas for the big meeting that are stylish and totally parent-proof. And don't worry, they're going to love you

Katy Perry online paired her stunning floral Oscar de la Renta ballgown with a bold Coach outlet store online at the American Music Awards Sunday night. Olympia is known for her signature clutches, which are all limited edition, hand-embroidered nods to literary history and Coach factory outlet culture. Her unique creations have inspired similar styles from brands like Coach handbags and Coach Outlet's Olympia.
2014-01-12 00:04:32
this is pretty cool! Super useful!

Thanks for the article!
2014-05-07 10:59:13 Beats By Dre Outlet Cheap Oakley Sunglaases Michael Kors Outlet Online North Clearace Outlet Ralph Polo Outlet Gucci Shoes UK Michael Kors Outlet Coach Factory Outlet North Face Outlet Online Polo Outlet Store Ralph Lauren UK Sac Longchamp Pairs Canada Goose Outlet Moncler Clearance Woolrich Clearance Barbour Jackets Outlet Online Moncler Jackets Outlet Online Coach Factory Outlet Online Burberry Outlet Air Max Shoes Michael Kors Outlet Coach Black Friday Coach Store Outlet Online Coach Outlet Coach Factory Coach Outlet USA Coach Purses Factory Coach Purses USA Coach Store Outlet Coach Purses On Sale Monster Beats Outlet Louis Vuitton Outlet Louis Vuitton Factory Marc Jacobs On Sale MCM Outlet MCM &#24215;&#38138; Michael Kors Factory Michael Kors Outlet Michael Kors Factory Online Coach Outlet Online Coach Factory Outlet Michael Kors Factory Outlet Coach Outlet Online USA Coach Factory Outlet Online Black Friday Coach Purses Outlet Online Michael Kors USA Michael Kors Outlet USA Ralph Lauren Polo

Post comment

Nick *
Your name or nickname
Your contact information (optional, will not be shown)
Text *
Content of your comment
Calculate *
(* - required field)
STAT NO AD [Stat] [Admin] [STAT NO AD] [pub] [Mirror] Copyright © 2004-2014 Adam Sawicki
Copyright © 2004-2014 Adam Sawicki