I took the lecture about the multi dimensional linear model.

I understand that we are trying to minimize the Error function we chose for the model.

What I don't understand are some of the assumptions:

1) That solving the linear system we got by taking the partial derivatives and comparing them to 0 - will yield the weights vector that will specifically

**minimize**the error? How can we assume that the error function has its local minima there (eg. why not local maximum)?

2) How can we assume that X.T.dot(X) is an invertible matrice?

Thanks!