Meta & resources
Machine Learning: concepts & procedures
Machine Learning: fundamental algorithms
Machine Learning: model assessment
Natural language processing
The computer science appendix

# Deep neural networks

Deep neural networks are those with multiple hidden layers. They are super-powerful beasts in that they allow us to solve very complicated problems with astonishing performance results, but also quite delicate due to their nature, so some care has to be taken to set them up in the best possible way.

The reason behind this has to be found in the computation of the cost function derivative itself. When you do it for a deep network, you find that it's a multiplication of factors of the type$f'w$, where$f$ is the activation function and$w$the weight at that layer.In the case of sigmoid neurons for instance, where the sigmoid is the activation function, the derivative of the sigmoid has a bell shape, peaked at 0 with a value of 0.25. These factors then tend to get smaller and smaller the more backwards, that is, the more layers, we go. This is the reason why the earlier the layer, the smaller the gradient. The source of this unstable behaviour is that the gradient in early layers is the product of terms in later layers, so the small values tend to multiply the more factors there are.