I’m trying to create a music visualizer that takes a song and outputs an image.
I’m very new to this, and I’ve been trying to understand FFT. I put together this sketch:
import ddf.minim.*;
import ddf.minim.analysis.*;
import ddf.minim.effects.*;
import ddf.minim.signals.*;
import ddf.minim.spi.*;
import ddf.minim.ugens.*;
Minim minim;
FilePlayer player;
AudioOutput out;
FFT fft;
float x = 0;
void setup() {
size(1024, 300);
minim = new Minim(this);
player = new FilePlayer(minim.loadFileStream("song.mp3"));
out = minim.getLineOut();
player.patch(out);
fft = new FFT(1024, player.sampleRate());
player.play();
background(32);
}
void draw() {
stroke(255);
fill(255);
x += .075;
fft.forward(out.mix);
float[] f = fft.getSpectrumReal();
float y = height / 2 + average(f) * 100;
line(x, height / 2, x, y);
}
float average(float[] array) {
float a = 0;
for (float f : array) {
a += f;
}
a /= array.length;
return a;
}
This sketch plays a song and draws lines based on the average value returned by fft.getSpectrumReal();
.
I’m not saying this is pretty or even makes any sense, but I just wanted to get something working.
What I’m surprised by is that if I run the same sketch multiple times using the same song, I get different results each time.
It’s hard to see in the above images, so here they are flipping back and forth:
If I run it a third time, I get a third slightly different result. This might not seem like a big deal, but my goal is to create a reproducible result, and more importantly I’d like to understand why this is happening.
I thought this might be because the song is playing at a different rate than the frame rate, so I tried adding this to the setup
function:
player.setSampleRate(60);
But that appears to have no effect.
I’ve also eliminated the possibility that this was coming from other sounds playing on my computer, or the mic picking up other sounds. (Neither seem to affect the output.)
My main question is: Why does FFT generate different results for the same song?
Like I said, I’m very new to all of this, so I also have some bonus questions:
- I thought FFT generated a count of frequencies, but many of the values are negative. How can a count be negative?
- Does the array returned by
fft.getSpectrumReal()
represent a single instant of time, or does it represent a time range? E.g. if I call it once per second, does it represent all of the sound played during that second, or is it the sound playing right at that exact moment? - Is it true that the array returned by
fft.getSpectrumReal()
maps to different “ranges”? Like is the first part of the array the bass, the second part of the array the mids, and the third part of the array treble? (Very roughly?) - Is there a more reasonable way to get values that represent a song? I saw that the
BeatDetect
class had some interesting functions:isHat
,isKick
,isSnare
. I know it’s probably not this simple, but is there a way to get, say, the voice level, or the guitar level?
I’ve been reading about FFT for a couple of days, but honestly I get lost in the math pretty quickly. I appreciate any resources geared towards non-mathy people you can send my way!