There’s only one bias per neuron. Not one for each weight. calculation should be
As an activation function ReLU would probably work better than sigmoid. It’s so much faster to calculate and at least in deep neural networks it gives about as good results as sigmoid.
Multilayer perceptrons are not that good with scaling or moving. Convolutional networks work much better with images. Still you should be able to get ~98% training accuracy with mnist data using multilayer perceptrons