Needle in a NaN stack

So, long story short, I am working on a neural network. Here it is, plus all the assorted files:

Read through my comments to better understand it, especially the keyPressed() function to see what sorts of actions you can do. Now, its main goal is to train itself to recognize the digits found in the imported files. To do just that, launch it, press e, then a, and then enter. Wait until it finishes, press a and enter and wait again. Something should be printed out now.

Repeat this last step of pressing a, enter and waiting for as long as you want. On each second iteration of this, the epoch counter (the number right below the 28x28 grid) is gonna go up. See how far you’re gonna get before all the numbers turn into tiny squares and the console starts printing out NaN.

This is driving me nuts. It’s just simple math that is happening behind the scenes, no different from a physics simulator. The parameters (weights and biases) are adjusted with velocity parameters, which themselves are getting updated with something akin to a derivative. All of this is happening within the calcBatchDerivative() function. It’s a complicated self-referential function, but I know it works, since sometimes, the network DOESN’T break, haha.

So, my question is, what’s causing the NaN’s? I guess they all turn simultaneously due to a rippling effect, but I don’t know where this begins, nor what’s causing it. I know that NaNs can occur when a number is too large. But there isn’t anything that would cause those weights and biases to jump like this (unless I’ve screwed up the math, but that is unlikely).

If you set batchSize to 1 before launch, then press e, but not a, you can keep pressing enter to launch the calcBatchDerivative() function individually. That could be helpful for debugging.

If you have any questions about the code, I can fill you in.

1 Like

The best advice I can give you is to add print statements or use a debugger to understand exactly what your code is doing. Can you try to isolate the problem in a small example that reproduces the issue?

Other things can produce NaNs these three statements all show NaN

println(2.0 % 0); 
println(0.0 / 0.0); 
1 Like

I just know it’s the calcBatchDerivative() function that causes this. I guess I will use the console to check it step by step. But with neural networks, this can oftentimes be tedious, as there can be tens of thousands of parameters :/

You might also try to isolate your error by wrapping different sections of your code in try/catch blocks, and/or asserting that low level values are NaN and seeing exactly when / where that fails.