Gradient Descent
Gradient Descent
Hi, your explanation of the Gradient Descent in Section 3.20 (Theory) indicated to "minimize" dJ/dW, but the example given in Section 3.21 (Code) maximizes dJ/dW, that is W += learning_rate * ... I was expecting the code to be W = learning_rate * ... to keep subtracting until we find the smallest W. Thanks in advance for the clarification.

 Site Admin
 Posts: 37
 Joined: Sat Jul 28, 2018 3:46 am
Re: Gradient Descent
Thanks for your question!
Maximizing J is the same as minimizing J.
To give you an example, maximizing x^2 is the same as minimizing x^2, they yield the same value of x.
The "gradient ascent" or "gradient descent" equations are actually exactly the same.
w = w  a*2x
w = w + a*(2x)
As you recall +(a) = a.
Maximizing J is the same as minimizing J.
To give you an example, maximizing x^2 is the same as minimizing x^2, they yield the same value of x.
The "gradient ascent" or "gradient descent" equations are actually exactly the same.
w = w  a*2x
w = w + a*(2x)
As you recall +(a) = a.