A "Hello World" of Neural Networks - Number Recognition

Yesterday I decided I wanted to start getting into neural networks, so after watching some awesome tutorials on it, I decided to start trying to code a number recognizer using the MNIST database, a project typically seen as introductory. Getting the ideas down were easy, but the math behind it was pretty hard to grasp at first (and the notation was AWFUL).

Now, for the hyperparameters:

  • 784 input neurons, 2 hidden layers 20 neurons each, and 10 output neurons
  • Step size of 0.2, 1 batch of 1000 images looped 100 times
  • Tested the network on 1000 images it hasn’t seen before

I’ve seen these parameters yield results that range from 98-99.7% accuracy during testing during different runs. I am really happy with this! I might share the source code if anyone’s interested, so that I can hopefully make it better.

Also, if anyone here is well versed in neural networks, I did notice two strange things:

  • The way I initialized weights and biases randomly at the start greatly influenced the network’s peak potential. I randomized them all from -0.3 to 0.3, which yielded good results, but if it was from -3 to 3, or 0 to 1, that could completely break my network. Why does it matter so much?
  • The amount of batches I used, network layer sizes, and batch size for training were able to influence the performance of my network quite a bit. Is there a way to find an optimal configuration for these hyperparameters from the start without having to go through so much trial and error?

Here are some images from different runs (I couldn’t draw the whole input layer to the screen lol):

Alright, that’s all I have for today. To anyone who sees this, have a great day!


Go check this stuff out for neural nets on youtube:

  • 3blue1brown’s series
  • Finn Eggers’s series