Switch Net 4 - Neural Network

Basically you are knitting together multiple tiny width 4 neural network layers using the fast Walsh Hadamard transform as a connectionist device, to make a deep neural network.
The mini-layer width doesn’t have to be 4, it could be 2, 8, 1, 16 or whatever.
I used Beyond ReLU for the neural layers, but you could use any non-linear neural layer type or even just some parametric function type.
In sum total the tiny layers use far less parameters and are far faster than a conventional layer would.
Beyond ReLU