Is Minim's FFT non-deterministic?

I’m trying to create a music visualizer that takes a song and outputs an image.

I’m very new to this, and I’ve been trying to understand FFT. I put together this sketch:

import ddf.minim.*;
import ddf.minim.analysis.*;
import ddf.minim.effects.*;
import ddf.minim.signals.*;
import ddf.minim.spi.*;
import ddf.minim.ugens.*;

Minim minim;
FilePlayer player;
AudioOutput out;
FFT fft;

float x = 0;

void setup() {
  size(1024, 300);

  minim = new Minim(this);

  player = new FilePlayer(minim.loadFileStream("song.mp3"));
  out = minim.getLineOut();
  player.patch(out);

  fft = new FFT(1024, player.sampleRate());
  player.play();

  background(32);
}

void draw() {
  stroke(255);
  fill(255);

  x += .075;

  fft.forward(out.mix);
  float[] f =   fft.getSpectrumReal();
  float y = height / 2 + average(f) * 100;
  line(x, height / 2, x, y);
}

float average(float[] array) {
  float a = 0;
  for (float f : array) {
    a += f;
  }
  a /= array.length;
  return a;
}

This sketch plays a song and draws lines based on the average value returned by fft.getSpectrumReal();.

I’m not saying this is pretty or even makes any sense, but I just wanted to get something working.

What I’m surprised by is that if I run the same sketch multiple times using the same song, I get different results each time.

It’s hard to see in the above images, so here they are flipping back and forth:

If I run it a third time, I get a third slightly different result. This might not seem like a big deal, but my goal is to create a reproducible result, and more importantly I’d like to understand why this is happening.

I thought this might be because the song is playing at a different rate than the frame rate, so I tried adding this to the setup function:

player.setSampleRate(60);

But that appears to have no effect.

I’ve also eliminated the possibility that this was coming from other sounds playing on my computer, or the mic picking up other sounds. (Neither seem to affect the output.)

My main question is: Why does FFT generate different results for the same song?

Like I said, I’m very new to all of this, so I also have some bonus questions:

  • I thought FFT generated a count of frequencies, but many of the values are negative. How can a count be negative?
  • Does the array returned by fft.getSpectrumReal() represent a single instant of time, or does it represent a time range? E.g. if I call it once per second, does it represent all of the sound played during that second, or is it the sound playing right at that exact moment?
  • Is it true that the array returned by fft.getSpectrumReal() maps to different “ranges”? Like is the first part of the array the bass, the second part of the array the mids, and the third part of the array treble? (Very roughly?)
  • Is there a more reasonable way to get values that represent a song? I saw that the BeatDetect class had some interesting functions: isHat, isKick, isSnare. I know it’s probably not this simple, but is there a way to get, say, the voice level, or the guitar level?

I’ve been reading about FFT for a couple of days, but honestly I get lost in the math pretty quickly. I appreciate any resources geared towards non-mathy people you can send my way!

I am willing to bet the change I have on my right pocket this is due to rounding errors. I will try to reproduce and report back this wkend.

The reason is that you need to work with the magnitude of your signal. In other words, your FFT operation should return a real and imaginary values and the actual frequency amplitude turns out to be the magnitude of these signals: sqrt(re**2+im**2). There is some extra details [here] which also explains that there is a normalization factor that you need to take into account when calculating the final amplitude. (https://dsp.stackexchange.com/questions/20500/negative-values-of-the-fft)

FFT and DFT algorithms attempts to model your input signal as a summation of sin + cos functions (in simple terms as I understand it). Discrete FT refers to signals that are discrete in nature, for example coming from your signal read and processed by an ADC. Theoretically one can work with continuous signals, FYI.

Why do I expect from a a FFT spectrum?

If the initial signal is a mix of known frequencies, each with their amplitudes, then FFT shall output a signal in the frequency domain that you can plot as freq vs. amplitude. The positions in the X axis indicates the initial frequencies involved in the mix. The Y axis, if the amplitude of that signal. If you calculate the magnitude, then you should get back the actual magnitude of those frequencies involved in the original input signal. Going back to what an FFT is, you can think of the data in the frequency domain as the factors that are used in the generated model from the FFT operation. Remember, FFT creates a representation of your input signal. It is written something like this:

  • Theta: input signal
  • Function g: FFT operation
g(theta) = SUM( a*f(v1) + b*f(v2) + c*f(v3) + ...)

Where v1, v2, v3,…,v_n represents the calculated frequencies obtained from the operation and a,b,c,…[X_n] represents the amplitude of those frequencies. Notice that amplitudes are zero when they don’t contribute to the initial input signal - no surprises here as this is what is expected.


FFT does not represent an exact instance in time but a time range. FFT takes an initial time array and transforms it and it outputs another array, this latter one in the frequency domain. In your case, I would expect that if the input signal is the same and you use the same transform operation, you should get the same frequency representation. I can see in Wikipedia that accuracy is an issue due to floating-point errors.

I have to look up the definition of bass and treble. I can tell you that the frequency spectrum represents frequencies from low to high starting from zero. I believe the zero frequency represent the average of your amplitude of your initial signal and for some cases, the frequency in this first bin is of not use. In addition, I believe only have of the frequency spectrum is valid as the other half is the mirror of the first half. However, that really depends on the FFT operation definition, the algorithm itself being used by the library.

I would think it is possible. You might need to apply a filter to the initial signal to remove high frequencies. I would also adjust the number of samples and the sample rate of the transform operation. I used an FFT a while back. I will see if I can find the code and if I can get it going.

Kf

1 Like

Hi Kevin,
I was able to reproduce this. In the image below you can see your code playing the same short mp3 file (jingle.mp3) and by visual inspection one can see they are not the same. I made a slightly modification:

  fft.forward(out.mix);
  println(frameCount, player.position());  //NEW

and I can see that the file player is not returning the same position when restarting the player. As an example, running the sketches twice returned the following values:

Try 1

1 0
2 0
3 46
4 116
5 116
6 162
7 162

Try 2

1 0
2 0
3 46
4 92
5 116
6 116
7 162
8 162

FFT should give you reproducible results if you use the same data. Miinim has other examples (For example, contributed libraries/Minim/Analysis/offlineAnalysis) which manages the initial data as a whole array where you manage each data chunk one at the time. This would work better for your case.

Kf

3-merged-v1

Are you sure the line out is completely silent?

player = new FilePlayer(minim.loadFileStream("song.mp3"));
out = minim.getLineOut();
player.patch(out);

What happens if you load a completely blank file?
Also, it seems like the setup is taking a file and loading the stream into the player, then listening to the output of that. I would think the difference in sampling due to the computer’s load at the time would also introduce some sort of jitter.

If you were doing a pure FFT of the entire file like with some of the libraries @kfrajer mentioned you should get the same results each time. However, in your code here it looks like the setup has an output-observer which has some inherent differences each time you run it.

Taking a look at the source code may also help: link I have no idea if this is the same version that Processing is currently on.