Training a DL model to find a local minima in n-dimensions can be a challenge. Often, data scientists and ML engineers would use a gradient descent to optimize the path. Starting delta may be anywhere between 1e-3 or 1e-4. Having a constant gridient would not fast-approach a local minima. There are few issues with this approach. 1) The first found local minima may not be the best minima. It can be stuck in a sharp valley, where any deriviate change would raise the error rate above 50% or more. 2) The first found local minima may be a local mixima, as shown in the saddle point graph below. When optimizing on n-th dimensions of space of a DL model, the best approach is to find a flat valley, when the SGD can locate a stable ground and where error rates stay low or relatively small to what it landed in the best optimization. However, there are a better way than this. Instead of manually entering an initial gradient decent value and updating it...
About State of the Art in Artificial Intelligence and Machine Vision