Music Analysis - Spectrogram

Warning! Some information on this page is older than 5 years now. I keep it for reference, but it probably doesn't reflect my current knowledge and beliefs.

Jul 2010

Music Analysis - Spectrogram

I've started learning about sound analysis. I have some deficiencies in education when it comes to digital signal processing (greetings for the professor who taught this subject at our university ;) but Wikipedia comes to the rescue. As a starting point, here is a spectrogram I've made from one of my recent favourite songs: Sarge Devant feat. Emma Hewitt - Take Me With You.

Now I'm going to exaplain in details how I've done this by showing some C++ code. First I had to figure out how to decode an MP3, OGG or other compressed sound format. FMOD is my favourite sound library and I knew it can play many file formats. It took me some time though to find functions for fast decoding uncompressed PCM data from a song without actually playing it for all 3 minutes. I've found on the FMOD forum that Sound::seekData and Sound::readData can do the job. Finally I've finished with this code (all code shown here is stripped from error checking which I actually do everywhere):

#include <fmod.hpp>
#pragma comment(lib, "fmodex_vc.lib")

// Initialize FMOD.
// Set output device type to dummy/null.
FMOD::System *g_System;
g_System->init(100, FMOD_INIT_NORMAL, 0);

// Open the music file as stream, software mode.
FMOD::Sound *sound;
g_System->createStream("C:\\...mp3", FMOD_SOFTWARE, 0, &sound);

// Fetch song parameters
/* In my case it is:
freq=44100, Format=FMOD_SOUND_FORMAT_PCM16, Channels=2, Bits=16
Length=221570 ms, PCM=9771264, Bytes=39085056 */
float freq; FMOD_SOUND_FORMAT format; int channels, bits;
uint lenMs, lenPcm, lenPcmBytes;
sound->getDefaults(&freq, 0, 0, 0);
sound->getFormat(0, &format, &channels, &bits);
sound->getLength(&lenMs,       FMOD_TIMEUNIT_MS);
sound->getLength(&lenPcm,      FMOD_TIMEUNIT_PCM);
sound->getLength(&lenPcmBytes, FMOD_TIMEUNIT_PCMBYTES);
StartMusic(freq, format, channels, bits, lenMs, lenPcm, lenPcmBytes);

// Decode song data
static const uint BUF_SIZE = 0x10000;
uint8 buf[BUF_SIZE];
bool processing = true;
  uint bytesRead;
  res = sound->readData(buf, BUF_SIZE, &bytesRead);
  if (res == FMOD_ERR_FILE_EOF)
    processing = false;
  ProcessMusicBuf(buf, bytesRead);
while (processing);


Bytes entering ProcessMusicBuf function (always 0x10000 = 65536 bytes except the last call) are PCM samples arranged like this: each sample is a little-endian signed short number for left channel followed by similar a number for right channel. So each sample has 4 bytes. There are 44100 samples per second (this is the sampling frequency in Hz).

To make a spectrum, one need to transform data from time domain to freqency domain, that is to perform the Fourier Transform. I used FFTW library for this. Here is how I do it:

#include <fftw3.h>
#pragma comment(lib, "libfftw3f-3.lib")

static const uint FFT_INPUT_SAMPLES = 4096;
static const uint FFT_OUTPUT_SAMPLES = (FFT_INPUT_SAMPLES / 2) + 1;
static float g_InBuf[FFT_INPUT_SAMPLES], g_WindowedBuf[FFT_INPUT_SAMPLES];
static uint g_InBufPos = FFT_INPUT_SAMPLES/2;
static fftwf_complex g_OutBuf[FFT_OUTPUT_SAMPLES];
static fftwf_plan g_Plan;

// Initialization
g_Plan = fftwf_plan_dft_r2c_1d(FFT_INPUT_SAMPLES, g_WindowedBuf, g_OutBuf, FFTW_ESTIMATE);

for (uint i = 0; i < FFT_INPUT_SAMPLES/2; ++i)
  g_InBuf[i] = 0.f;

// Finalization

// Processing function
void ProcessMusicBuf(const void *buf, uint bytes)
  const uint8 *bufBytes = (const uint8*)buf;
  uint samples = bytes / 4;
  while (samples)
    uint samplesToCopy = std::min(samples, FFT_INPUT_SAMPLES - g_InBufPos);
    PostSamples(g_InBuf + g_InBufPos, bufBytes, samplesToCopy);
    g_InBufPos += samplesToCopy;
    if (g_InBufPos == FFT_INPUT_SAMPLES)
      memcpy(g_InBuf, g_InBuf + FFT_INPUT_SAMPLES/2, sizeof(float) * FFT_INPUT_SAMPLES / 2);
      g_InBufPos = FFT_INPUT_SAMPLES/2;

    samples -= samplesToCopy;
    bufBytes += samplesToCopy * 4;

// Helper functions

void PostSamples(float *outSamples, const uint8 *inPcmData, uint sampleCount)
  for (uint i = 0; i < sampleCount; ++i, inPcmData += 4, ++outSamples)
    short lSample = *(const short*)inPcmData;
    short rSample = *(const short*)(inPcmData + 2);
    *outSamples = ((float)lSample + (float)rSample) / 65536.f;

static void PostInBuf()
  for (uint i = 0; i < FFT_INPUT_SAMPLES; ++i)
    g_WindowedBuf[i] = g_InBuf[i];
    // Hamming window
    g_WindowedBuf[i] *= 0.54f - 0.46f * cosf( (PI * 2.f * i) / (FFT_INPUT_SAMPLES - 1) );


  // I rewrite to g_OutBuf[i][0] squared absolute value of a complex number g_OutBuf[i].
  for (uint i = 0; i < FFT_OUTPUT_SAMPLES; ++i)
    g_OutBuf[i][0] = g_OutBuf[i][0]*g_OutBuf[i][0] + g_OutBuf[i][1]*g_OutBuf[i][1];
  // see below...

Here I have a window for 4096 samples sliding along the data. PCM values (2 signed shorts) are averaged and rescaled to -1..1 for the g_InBuf by PostSamples function. g_InBuf is filled with zeros to the half at the beginning and then each time 2048 new samples is read, the window is advanced bo 2048 samples. So each time I process the buffer with PostInBuf, I have 2048 new samples on the right and 2048 old samples on the left. In other words, windows overlap by a half. I process buffer every 2048 samples which gives 0.04644 s.

I process g_InBuf into g_WindowedBuf using Hamming window. There exist many Window functions with different properties and my choice so far is rather arbitrary :)

Next I execute the FFT transform. 4096 real numbers on input processed by FFTW dft_r2c_1d transform gives 4096/2+1 = 2049 complex numbers at output, which I suppose can be interpreted as frequencies from 0 to 22050 Hz with 10.7666 Hz step. I don't know yet how could I use the phase of these complex numbers so I only measure squared absolute values (real*real + imag*imag), as they probably give the amplitude, energy or whatever at particular frequency and in particular point in time.

Finally to visualize my spectrum I use FreeImage library. It can allocate, fill and then save a bitmap in many different image file formats (like BMP, PNG, JPEG). Here is the code:

#include <FreeImage.h>
#pragma comment(lib, "FreeImage.lib")

static const uint imgSizeX = 4759;
static const uint imgSizeY = 256;
static FIBITMAP *bitmap;
static uint g_ImageX;

// Initialization
bitmap = FreeImage_Allocate((int)imgSizeX, (int)imgSizeY, 24);

// Finalization
FreeImage_Save(FIF_BMP, bitmap, "G:\\tmp\\Spectrogram.bmp");

void PostInBuf()
  // see above...
  if (g_ImageX < imgSizeX)
    uint samplesPerPixel = FFT_OUTPUT_SAMPLES / imgSizeY;
    float val;
    RGBQUAD pixel;
    for (uint y = 0, sample = 0; y < imgSizeY; ++y)
      // Average values over samplesPerPixel numbers.
      val = 0.f;
      for (uint localSample = 0; localSample < samplesPerPixel; ++localSample, ++sample)
        val += g_OutBuf[sample][0];
      val /= (float)samplesPerPixel;
      val = log10f(val + 1.f); // Logarithm.
      val *= 0.4f / 3.f; // Manually selected scaling.
      val = common::minmax(0.f, val, 1.f); // Clamp.
      val = powf(val, 1.f/2.2f); // Gamma (a computer graphics thing :)
      pixel.rgbRed = pixel.rgbGreen = pixel.rgbBlue = (uint8)(val * 255.f + 0.5f);
      FreeImage_SetPixelColor(bitmap, g_ImageX, y, &pixel);


I know these algorithms could be optimized, parallelized and so on, but now I'm more concerned about what to do with this further. What information can be extracted from music to be used for generating gameplay like in AudioSurf? What are good articles about this on the Internet?

The spectrogram I made already reveals some regular features, so I think it could be now processed as image :) But 2D image recognition and processing is another broad topic...

Another of my questions for those who managed to read up to this point is: wouldn't be faster and more convenient to do such math research in Scilab instead of C++? Is it possible to do such things in Scilab anyway?

Comments (1) | Tags: math libraries music rendering dsp algorithms | Author: Adam Sawicki | Share


2016-03-25 09:35:29
Jiangnan misty rain, rendered the best season. Wind carrying sense of subtle fragrance, long hair flying ask; Looking back, love in their hearts. So, listen to the song "spring", a paper gentle run and keep a Huaian Ning. Spring ten, "has been dyed red, lips already wet, the heart has been drunk." Fengguowuhen, Take It light. "Life went by self-serving" and spring, only such as strike. Rain outside the window fog smoke, Ming gradually green grass, trees have been getting into the spring green ...... string, waiting for the flowers to open it.After the Northeast Anti-Japanese United Army, in Japan and France occupied the Northeast, he built a puppet of the puppet "Manchukuo" regime, in the heart of the Japanese rule, began to struggle to resist, after unimaginable difficult to overcome numerous obstacles, the struggle, a heavy blow to the arrogance of the aggressor, write a fearlessly tragic poetry. So far, the song still sonorous echo in the hearts of the people, the people sang this song, their heads held high with pride march in the renaissance of China's great cause of China dream journey. "Camping song" This poem is headed by General Zhaolin, in Liberating, Chen Lei and other people involved in the anti-Japanese soldiers collective creation. It is a manifestation of the heroic spirit, it is a hero of the song. Zhaolin, is one of the main leaders of the CPC Provincial Committee at the time the North over the Northeast Anti-Japanese coalition founder, 100 for the new China was founded, one of heroes and models have made outstanding contributions. He was head of the Central Military Commission of CPC Provincial Committee of Manchuria, the Pearl River anti-Japanese guerrilla vice captain, Hartung detachment commissar, Northeast Anti-Japanese United Army Sixth Army Political Department Acting Director, Third Army Political Department, General Political Department Director of the North over the anti-Japanese coalition Northeast Anti-Japanese United Army and commander of the third Army and other staff. Tieling absolutely rock, wooded, storm winds, wilderness waterside war maming. Qi unity around the fire, the sky shines red. Comrades! Reiz even if Songjiang late waves of Health. Up ah! Bold assault, by the Japanese, the northeast complex, the day dawn, Guanghua lofty Chung. (B) the shade of my day, filled with wildflowers, wet low dark cloud, foot ulcer sweat asthma difficult. Red fireworks from the air, sucking mosquito hemodialysis shirt. Soldiers! Warmly Xing'an thousand mountains beyond the pale. Struggle ah! Heavy tasks, sudden blockade, siege breaking, dawn to dark swept away completely. (C) abandoned fields everywhere, cross dew days, crystal clear night fire, the enemy base frequency is not scared before the horse. Gold winds and dry grass, frost morning fire non-combustible. Brothers! Jingpo cascades evoke afternoon dream intoxicated. Hand in it! Meet the national crisis, vibration long tassel, tied strong slaves, mountains and rivers change, the moment income populations. Iv) north wind howling, the snow flying, horse levy hesitate, air invasion sleep at night. Roasted chest warm, the wind behind the cold. Heroes are! Sincere struggling swept Nenjiang original. Wei Chi Xi! They can cut. The whole nation, all classes, unite and regain my rivers and mountains. This song poem, successively by Joseph Lee, in Liberating, Chen Lei and other write drafting or amending, leaving these immortal epic music, tragic poetry. Song poetry, from the spring, summer, autumn and winter to describe the four seasons. Although not written throughout the resistance soldiers fighting course, there is no description of the scene from the side of Sino-Japanese War brought up again, but it is very prominent in the anti-Union soldiers with no unmatched heroism. The soldiers all the time throughout the year from the harsh natural environment, all by lethal threat to life, and their mood was very optimistic, but it will be so strong, the pursuit of spiritual liberation of the motherland was so resolutely so that it can be seen anti Patriots have great spirit, love for the motherland's unwavering belief in the end bloody battle with their enemy heroism. From the date of birth in 1931, the resistance began, that is, they fight in the heart of the enemy. From north to south over full, from the mountains to the plains, from the forest to the canyon, they in the form of guerrilla warfare, launched a tough fight against Japanese troops. Although the Japanese invaders took various means to block the people against the associated support, so that they not only lack of food, clothing, medicines, and only in the depths of the jungle camp, Village bar, activities, but the resistance is still doing everything possible through wisdom, strategy , merciless blow to the enemy. Spring, in our hearts, it is the most beautiful season. Because, in spring warm spring breeze, warm sun, clear emerald green grass,

Post comment

Nick *
Your name or nickname
Your contact information (optional, will not be shown)
Text *
Content of your comment
Calculate *
(* - required field)
STAT NO AD [Stat] [Admin] [STAT NO AD] [pub] [Mirror] Copyright © 2004-2017 Adam Sawicki
Copyright © 2004-2017 Adam Sawicki