10.2. Exploding Gradient Problem
The exploding gradient problem is another issue related to the training of deep neural networks, somewhat like the flip side of the vanishing gradient problem. This problem occurs when the gradient becomes too large, which can cause an unstable and inefficient learning process.
In detail, during backpropagation, gradients are passed back through the network. With each layer, these gradients are multiplied by the weights of the current layer. When these weights have large values or the gradients themselves are large, the result of this multiplication can be a very large gradient. When the network is deep, i.e., has many layers, these large gradients can result in very large updates to neural network model weights during training.
This ultimately leads to an unstable network, as the weights can become too large to handle and might result in NaN values. Moreover, instead of reaching a good stable solution, the network’s performance can become highly volatile, bouncing around the loss landscape.
Several methods exist for mitigating the exploding gradient problem. These include gradient clipping (which essentially puts a cap on the size of the gradient), better weight initialization strategies, changing the architecture of the network, and using different optimization strategies.
Just like the vanishing gradient problem, the exploding gradient problem tends to occur more with certain types of activation functions — typically those that don’t squash their input into a small range. And similar to the vanishing gradient problem, the exploding gradient problem makes it challenging to effectively train deep neural networks, as it results in large, inefficient steps during the learning process.