The gradient descent is a derivative of the MSE, so would be (2/m) instead of (1/m). You’ll see the same eventual result but the correct equation converges more quickly.
Also, using the variable ‘m’ for length when it is normally used for slope is confusing.
See https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html
Overall, good article and great use of numpy linear algebra dot products for efficiency. I also like storing theta (slope, bias) and cost history for each iteration/epoch to see progress.