Gradient Descent

Post by johnchu »

Hi, your explanation of the Gradient Descent in Section 3.20 (Theory) indicated to "minimize" dJ/dW, but the example given in Section 3.21 (Code) maximizes dJ/dW, that is W += learning_rate * ... I was expecting the code to be W -= learning_rate * ... to keep subtracting until we find the smallest W. Thanks in advance for the clarification.
Post by lazyprogrammer »

Thanks for your question!

Maximizing J is the same as minimizing -J.

To give you an example, maximizing -x^2 is the same as minimizing x^2, they yield the same value of x.

The "gradient ascent" or "gradient descent" equations are actually exactly the same.

w = w - a*2x

w = w + a*(-2x)

As you recall +(-a) = -a.
