Why does uncommenting this line of code increase sketch performance by 20 times faster?

Hi :smiley:

Yesterday, I started writing a 2D game engine from scratch in JAVA, and I was benchmarking every single line of code I added to my class to keep the framerate high. Then after around 2 hours of writing my code I came across a huge drop in FPS when I used a local variable to write into an int array inside a for loop. My FPS went from an average of 14,000+, and then dropped down to an average of 700+ fps (20 times slower).

There is a single line of code commented out in the ‘setPixels’ method. Please try the benchmark with the code as it currently is, and then again by uncommenting the line of code. The second benchmark should be around 20 times faster than before.

All the commented line of code does is reassign the local ‘w’ variable to the exact same value that was passed into the ‘setPixels’ method, however, this then causes an increase in performance by 20 times faster. Please help me understand why this is happening, because this huge change in framerate has me confused. Thanks. :slight_smile:

void setup() {
  size(640, 480);
  loadPixels();
  benchmark();
}

void benchmark() {
  int startTime = millis();
  int framesPerSecond = 0;
  
  while (true) {
    setPixels(width);

    if (millis() - startTime >= 1000) {
      println("fps: " + framesPerSecond);
      framesPerSecond = 0;
      startTime = millis();
    }

    framesPerSecond++;
  }
}

void setPixels(int w) {
  //w = 640;

  for (int i = 0; i < pixels.length; i++) {
    int x = i % w;
    int y = i / w;
    int index = x + (y * w);
    pixels[index] = 0xff9900;
  }
}
3 Likes

Hi, and welcome!

That’s interesting! Apparently, it’s the same result if you write the constant directly instead of via a local variable.

void setPixels(int w) {
  //w = 640;

  for (int i = 0; i < pixels.length; i++) {
    int x = i % 640;
    int y = i / 640;
    int index = x + (y * 640);
    pixels[index] = 0xff9900;
  }
}

I’m guessing the compiler is smart enough to know that it doesn’t change value in the function, and replaces it with a constant. Same happens if you define w as a constant (final), in or outside of the function (dont use it as a parameter in setPixels then).

But if you use a global variable and set it to 640, it doesn’t help with performance.

I don’t know why, but I’m guessing fetching variable values is expensive? I’m sure others here have a better explanation.

4 Likes

not an expert at all… but, the delay or non-delay does not take place through the commented line but the loop takes longer when w=640; is not used. Hard to measure since its small numbers, but the loop takes 1 ms instead of "0"ms… very interesting though. rarons idea seems to deserve merit imo.

void setPixels(int w) {
  start1 = millis();
  //w = 640;
  end1 = millis()-start1;
  println (end1);
  start1 = millis();
  for (int i = 0; i < pixels.length; i++) {
    int x = i % w;
    int y = i / w;
    int index = x + (y * w);
    pixels[index] = 0xff9900;
  }
  end1 = millis()-start1;
  println ("loop: "+end1);
}

I think the comments are valid, but I wonder what is the purpose of your code?
If you substitute it with the usual, it is as fast.

for (int y = 0; y < height; y++ ) { 
    for (int x = 0; x < width; x++ ) { 
      int loc = x+y*width;
      pixels[loc] = 0xff9900;
    }
  }

This was just an example, it has no purpose other than to show the FPS difference between the single line of code being commented / uncommented. I stumbled across this performance problem when working on another project, which has different code to what I have shared on here. :smiley:

2 Likes

I still think that it has something to do with how you access the pixel array because if you use the same principle/concept but including height as well in the same matter, then there is no difference.

void setup() {
  size(640, 480);
  loadPixels();
  benchmark();
}

void benchmark() {
  int startTime = millis();
  int framesPerSecond = 0; 
  while (true) {
    setPixels(width, height);
    if (millis() - startTime >= 1000) {
      println("fps: " + framesPerSecond);
      framesPerSecond = 0;
      startTime = millis();
    }
    framesPerSecond++;
  }
}

void setPixels(int w, int h) {
  //w = 640;
  for (int y = 0; y < h; y++ ) { 
    for (int x = 0; x < w; x++ ) { 
      int loc = x+y*w;
      pixels[loc] = 0xff9900;
    }
  }
}
1 Like

If you just wanna fill up an array w/ a single value simply use Arrays.fill() method: :wink:
java.util.Arrays.fill(pixels, 0xff9900);
Docs.Oracle.com/en/java/javase/11/docs/api/java.base/java/util/Arrays.html#fill(int[],int)

1 Like

Unfortunately, using constants is not the reason for the increase in FPS speed. Here is a small change in the code that demonstrates this. If the value of ‘w2’ is changed to 640 (which is the exact same value as ‘w’), then the FPS increases by 20 times faster.

void setPixels(int w) {
  final int w2 = w;

  for (int i = 0; i < pixels.length; i++) {
    int x = i % w2;
    int y = i / w2;
    int index = x + (y * w2);
    pixels[index] = 0xff9900;
  }
}

I think I should mention in this thread, I’m not trying to do anything useful with the ‘setPixels’ method. This sketch I have posted was simply an example I came up with to demonstrate a speed problem that I discovered when using variables in a for loop.

Since using a single variable that doesn’t change in value causes a FPS drop by at least 20 times, this is a BIG DEAL, since there are a lot of sketches that could run considerably faster if we can find out what is causing this problem, and modify the code in those sketches to avoid the speed problem. :smiley:

1 Like

So why Processing doesn’t use that in the background function?

import static java.util.Arrays.*;
int start;
color c = color(0);

void setup() {
  size(500, 500);
  background(255);
  fill(0);
  loadPixels();
  
  start = millis(); 
  for (int i = 0; i < 10000; i++) java.util.Arrays.fill(pixels, color(c));    
  println("Arrays.fill = "+(millis()-start));
 
  start = millis();
  for(int i = 0; i < 10000; i++) background(color(c));
  println("background() = "+(millis()-start));
 
  start = millis();
  for(int i = 0; i < 10000; i++) rect(0, 0, width, height);
  println("rectangle = "+(millis()-start));
  
  start = millis();
  for (int i = 0; i < 10000; i++) {
    for (int y = 0; y < height; y++ ) { 
      for (int x = 0; x < width; x++ ) { 
        int loc = x+y*width;
        pixels[loc] = color(0);
      }
    }
  }
  println("loc = "+(millis()-start)); 
  updatePixels();
}

But this works (also as a global constant)
final int w2 = 640;

But not as a global variable
int w2 = 640;
(But it works locally in the function, as you demonstrated).

Really odd. I’d think the pixels loop would be much more time consuming than just getting one variable.

I forgot to include adamskii_uk 's method which is faster than the others except the very fast for GoToLoop’s method of course.

int start;

void setup() {
  size(500, 500);
  background(255);
  fill(0);
  loadPixels();
  start = millis();

  for (int i = 0; i < 10000; i++) {
    for (int y = 0; y < height; y++ ) { 
      for (int x = 0; x < width; x++ ) { 
        int loc = x+y*width;
        pixels[loc] = color(0);
      }
    }
  }
  println("loc = "+(millis()-start)); 

  start = millis();
  for (int j = 0; j < 10000; j++) {
    for (int i = 0; i < pixels.length; i++) {
      int x = i % 500;
      int y = i / 500;
      int index = x + (y * 500);
      pixels[index] = color(0);
    }
  }
  println("% /  = "+(millis()-start));
  updatePixels();
}
1 Like

Leaving aside how bad the benchmarks are (no warmup, use of millis(), etc.) two options come to mind.

Non-sequential array access - doesn’t seem to be here, may be in original code? CPU caches are optimized for accessing contiguous memory.

No bounds check elimination - all array access is bounds checked in Java, but the VM will eliminate the bounds checks where it can easily prove index values do not go out of range. It’s possible that this is not happening in the slower case.

2 Likes

I’ve made another discovery after making some edits to my original code. Here’s the modified code:

int[] values;

void setup() {
  size(640, 480);
  values = new int[width * height];
  fpsLoop();
}

void fpsLoop() {
  long startTime = System.nanoTime();
  int framesPerSecond = 0;
  
  while (true) {
    setValues(1);

    if (System.nanoTime() - startTime >= 1000000000) {
      println("fps: " + framesPerSecond);
      framesPerSecond = 0;
      startTime = System.nanoTime();
    }

    framesPerSecond++;
  }
}

void setValues(int step) {
  for (int i = 0; i < values.length; i += step) {}
}

I have created a ‘setValues’ method that takes a single argument called ‘step’ and I have passed in a value of 1 into the method. The local ‘step’ variable is then used directly in the for loop counter, to increment the ‘i’ value by 1 each time. When playing the sketch, the framerate starts at a consistant 7700+ fps, but then after approx 15 seconds, the framerate shoots up to 52 million+ fps. This could be the “warmup” part that neilcsmith mentioned in the previous post.

But, then I created a new int variable at the top of the ‘fpsLoop’ method and gave it a value of 1. I then passed this new variable as a parameter into the ‘setValues’ method. This time the framerate starts at 7700+ just like before, but then after approx 15 seconds then framerate drops down to 3800+ and stays there.

1 Like

Interestingly, replacing

  w = 640;

with

  if (w != 640) {
    throw new RuntimeException();
  }

results in the same performance boost.

I’ve decompiled and compared the bytecode generated with the original code, with w = 640;, and with if (w != 640) throw new RuntimeException();.

My findings:

  • There’s no optimisation happening at the compilation, the added code is simply added to the bytecode.
  • The same happens in plain Java, outside of Processing.
  • The same happens without using any arrays.

The real magic happen at execution, the JVM is somehow able to take shortcuts to make the code run faster.

Here’s plain Java code which produces the same behaviour:

static void benchmark() {
  for (int i = 0; i < 10; i++) {
    long start = System.currentTimeMillis();
    setPixels(640);
    System.out.printf("%4d\n", System.currentTimeMillis() - start);
  }
}

static long sum = 0;

static void setPixels(int w) {
  // w = 640;

  if (w != 640) {
    throw new RuntimeException();
  }

  for (long i = 0; i < 200000000; i++) {
    long index = i / w;
    sum += index;
  }
}

Which outputs:

1499
1524
 208
 211
 208
 205
 208
 204
 204
 205

The first 2 calls of the methods take as long as they do without the optimisation, then the rest are much faster.


You can run the code without any runtime optimisation:

java -Djava.compiler=NONE MyClass

This will take much longer, and won’t get faster by uncommenting the magic code.

3 Likes

Yes, this is exactly how the JVM works. Bytecode is compiled to native code and continually optimized at runtime. The Java compiler generally does no optimization - this allows for code to run faster on newer JVMs. There’s a massive amount of “magic” that happens when running - code reordered, loops unrolled, inlined methods, objects never created (escape analysis), fields cached locally, branch prediction, speculative optimization (eg. assuming method calls are always receiving same type).

It can also lead to some odd effects - eg. quite a bit of naive multi-threaded code around here is broken because people assume values aren’t cached or code executes in the order it’s written.

Likewise, the JVM traps segfaults to catch some occasions where it’s made the wrong assumption.

Likewise, did you know that an object can be garbage collected before its constructor has even finished executing?!

There’s a whole rabbit hole of interesting info to explore around Java runtime optimizations! :smile:

3 Likes