Gradient Descent

Post Reply
johnchu
Posts: 1
Joined: Thu Aug 02, 2018 2:43 am

Gradient Descent

Post by johnchu » Thu Aug 02, 2018 2:54 am

Hi, your explanation of the Gradient Descent in Section 3.20 (Theory) indicated to "minimize" dJ/dW, but the example given in Section 3.21 (Code) maximizes dJ/dW, that is W += learning_rate * ... I was expecting the code to be W -= learning_rate * ... to keep subtracting until we find the smallest W. Thanks in advance for the clarification.

lazyprogrammer
Site Admin
Posts: 9
Joined: Sat Jul 28, 2018 3:46 am

Re: Gradient Descent

Post by lazyprogrammer » Thu Aug 23, 2018 9:13 pm

Thanks for your question!

Maximizing J is the same as minimizing -J.

To give you an example, maximizing -x^2 is the same as minimizing x^2, they yield the same value of x.

The "gradient ascent" or "gradient descent" equations are actually exactly the same.

w = w - a*2x

w = w + a*(-2x)

As you recall +(-a) = -a.

Post Reply

Return to “Deep Learning Prerequisites: Logistic Regression in Python”