The gradient descent is a derivative of the MSE, so would be (2/m) instead of (1/m).

1 min readSep 26, 2019

The gradient descent is a derivative of the MSE, so would be (2/m) instead of (1/m). You’ll see the same eventual result but the correct equation converges more quickly.

Also, using the variable ‘m’ for length when it is normally used for slope is confusing.

See https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html

Overall, good article and great use of numpy linear algebra dot products for efficiency. I also like storing theta (slope, bias) and cost history for each iteration/epoch to see progress.

Written by David Turvene

No responses yet