David Turvene
1 min readSep 26, 2019

--

The gradient descent is a derivative of the MSE, so would be (2/m) instead of (1/m). You’ll see the same eventual result but the correct equation converges more quickly.

Also, using the variable ‘m’ for length when it is normally used for slope is confusing.

See https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html

Overall, good article and great use of numpy linear algebra dot products for efficiency. I also like storing theta (slope, bias) and cost history for each iteration/epoch to see progress.

--

--

David Turvene
David Turvene

Written by David Turvene

Experienced software engineer with a background in Linux, Embedded Systems, Telecom/Datacom, C/C++, Python. BSc CSE from University of Pennsylvania.

No responses yet