Yesterday, I started writing a 2D game engine from scratch in JAVA, and I was benchmarking every single line of code I added to my class to keep the framerate high. Then after around 2 hours of writing my code I came across a huge drop in FPS when I used a local variable to write into an int array inside a for loop. My FPS went from an average of 14,000+, and then dropped down to an average of 700+ fps (20 times slower).
There is a single line of code commented out in the âsetPixelsâ method. Please try the benchmark with the code as it currently is, and then again by uncommenting the line of code. The second benchmark should be around 20 times faster than before.
All the commented line of code does is reassign the local âwâ variable to the exact same value that was passed into the âsetPixelsâ method, however, this then causes an increase in performance by 20 times faster. Please help me understand why this is happening, because this huge change in framerate has me confused. Thanks.
void setup() {
size(640, 480);
loadPixels();
benchmark();
}
void benchmark() {
int startTime = millis();
int framesPerSecond = 0;
while (true) {
setPixels(width);
if (millis() - startTime >= 1000) {
println("fps: " + framesPerSecond);
framesPerSecond = 0;
startTime = millis();
}
framesPerSecond++;
}
}
void setPixels(int w) {
//w = 640;
for (int i = 0; i < pixels.length; i++) {
int x = i % w;
int y = i / w;
int index = x + (y * w);
pixels[index] = 0xff9900;
}
}
Thatâs interesting! Apparently, itâs the same result if you write the constant directly instead of via a local variable.
void setPixels(int w) {
//w = 640;
for (int i = 0; i < pixels.length; i++) {
int x = i % 640;
int y = i / 640;
int index = x + (y * 640);
pixels[index] = 0xff9900;
}
}
Iâm guessing the compiler is smart enough to know that it doesnât change value in the function, and replaces it with a constant. Same happens if you define w as a constant (final), in or outside of the function (dont use it as a parameter in setPixels then).
But if you use a global variable and set it to 640, it doesnât help with performance.
I donât know why, but Iâm guessing fetching variable values is expensive? Iâm sure others here have a better explanation.
not an expert at all⌠but, the delay or non-delay does not take place through the commented line but the loop takes longer when w=640; is not used. Hard to measure since its small numbers, but the loop takes 1 ms instead of "0"ms⌠very interesting though. rarons idea seems to deserve merit imo.
void setPixels(int w) {
start1 = millis();
//w = 640;
end1 = millis()-start1;
println (end1);
start1 = millis();
for (int i = 0; i < pixels.length; i++) {
int x = i % w;
int y = i / w;
int index = x + (y * w);
pixels[index] = 0xff9900;
}
end1 = millis()-start1;
println ("loop: "+end1);
}
This was just an example, it has no purpose other than to show the FPS difference between the single line of code being commented / uncommented. I stumbled across this performance problem when working on another project, which has different code to what I have shared on here.
I still think that it has something to do with how you access the pixel array because if you use the same principle/concept but including height as well in the same matter, then there is no difference.
void setup() {
size(640, 480);
loadPixels();
benchmark();
}
void benchmark() {
int startTime = millis();
int framesPerSecond = 0;
while (true) {
setPixels(width, height);
if (millis() - startTime >= 1000) {
println("fps: " + framesPerSecond);
framesPerSecond = 0;
startTime = millis();
}
framesPerSecond++;
}
}
void setPixels(int w, int h) {
//w = 640;
for (int y = 0; y < h; y++ ) {
for (int x = 0; x < w; x++ ) {
int loc = x+y*w;
pixels[loc] = 0xff9900;
}
}
}
Unfortunately, using constants is not the reason for the increase in FPS speed. Here is a small change in the code that demonstrates this. If the value of âw2â is changed to 640 (which is the exact same value as âwâ), then the FPS increases by 20 times faster.
void setPixels(int w) {
final int w2 = w;
for (int i = 0; i < pixels.length; i++) {
int x = i % w2;
int y = i / w2;
int index = x + (y * w2);
pixels[index] = 0xff9900;
}
}
I think I should mention in this thread, Iâm not trying to do anything useful with the âsetPixelsâ method. This sketch I have posted was simply an example I came up with to demonstrate a speed problem that I discovered when using variables in a for loop.
Since using a single variable that doesnât change in value causes a FPS drop by at least 20 times, this is a BIG DEAL, since there are a lot of sketches that could run considerably faster if we can find out what is causing this problem, and modify the code in those sketches to avoid the speed problem.
I forgot to include adamskii_uk 's method which is faster than the others except the very fast for GoToLoopâs method of course.
int start;
void setup() {
size(500, 500);
background(255);
fill(0);
loadPixels();
start = millis();
for (int i = 0; i < 10000; i++) {
for (int y = 0; y < height; y++ ) {
for (int x = 0; x < width; x++ ) {
int loc = x+y*width;
pixels[loc] = color(0);
}
}
}
println("loc = "+(millis()-start));
start = millis();
for (int j = 0; j < 10000; j++) {
for (int i = 0; i < pixels.length; i++) {
int x = i % 500;
int y = i / 500;
int index = x + (y * 500);
pixels[index] = color(0);
}
}
println("% / = "+(millis()-start));
updatePixels();
}
Leaving aside how bad the benchmarks are (no warmup, use of millis(), etc.) two options come to mind.
Non-sequential array access - doesnât seem to be here, may be in original code? CPU caches are optimized for accessing contiguous memory.
No bounds check elimination - all array access is bounds checked in Java, but the VM will eliminate the bounds checks where it can easily prove index values do not go out of range. Itâs possible that this is not happening in the slower case.
Iâve made another discovery after making some edits to my original code. Hereâs the modified code:
int[] values;
void setup() {
size(640, 480);
values = new int[width * height];
fpsLoop();
}
void fpsLoop() {
long startTime = System.nanoTime();
int framesPerSecond = 0;
while (true) {
setValues(1);
if (System.nanoTime() - startTime >= 1000000000) {
println("fps: " + framesPerSecond);
framesPerSecond = 0;
startTime = System.nanoTime();
}
framesPerSecond++;
}
}
void setValues(int step) {
for (int i = 0; i < values.length; i += step) {}
}
I have created a âsetValuesâ method that takes a single argument called âstepâ and I have passed in a value of 1 into the method. The local âstepâ variable is then used directly in the for loop counter, to increment the âiâ value by 1 each time. When playing the sketch, the framerate starts at a consistant 7700+ fps, but then after approx 15 seconds, the framerate shoots up to 52 million+ fps. This could be the âwarmupâ part that neilcsmith mentioned in the previous post.
But, then I created a new int variable at the top of the âfpsLoopâ method and gave it a value of 1. I then passed this new variable as a parameter into the âsetValuesâ method. This time the framerate starts at 7700+ just like before, but then after approx 15 seconds then framerate drops down to 3800+ and stays there.
Iâve decompiled and compared the bytecode generated with the original code, with w = 640;, and with if (w != 640) throw new RuntimeException();.
My findings:
Thereâs no optimisation happening at the compilation, the added code is simply added to the bytecode.
The same happens in plain Java, outside of Processing.
The same happens without using any arrays.
The real magic happen at execution, the JVM is somehow able to take shortcuts to make the code run faster.
Hereâs plain Java code which produces the same behaviour:
static void benchmark() {
for (int i = 0; i < 10; i++) {
long start = System.currentTimeMillis();
setPixels(640);
System.out.printf("%4d\n", System.currentTimeMillis() - start);
}
}
static long sum = 0;
static void setPixels(int w) {
// w = 640;
if (w != 640) {
throw new RuntimeException();
}
for (long i = 0; i < 200000000; i++) {
long index = i / w;
sum += index;
}
}
Which outputs:
1499
1524
208
211
208
205
208
204
204
205
The first 2 calls of the methods take as long as they do without the optimisation, then the rest are much faster.
You can run the code without any runtime optimisation:
java -Djava.compiler=NONE MyClass
This will take much longer, and wonât get faster by uncommenting the magic code.
Yes, this is exactly how the JVM works. Bytecode is compiled to native code and continually optimized at runtime. The Java compiler generally does no optimization - this allows for code to run faster on newer JVMs. Thereâs a massive amount of âmagicâ that happens when running - code reordered, loops unrolled, inlined methods, objects never created (escape analysis), fields cached locally, branch prediction, speculative optimization (eg. assuming method calls are always receiving same type).
It can also lead to some odd effects - eg. quite a bit of naive multi-threaded code around here is broken because people assume values arenât cached or code executes in the order itâs written.
Likewise, the JVM traps segfaults to catch some occasions where itâs made the wrong assumption.
Likewise, did you know that an object can be garbage collected before its constructor has even finished executing?!
Thereâs a whole rabbit hole of interesting info to explore around Java runtime optimizations!