In [1]:
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv('../data/height_weight.csv')

In [4]:
X = df['height'].values
y = df['weight'].values

In [8]:
"""
if X is a matrix representing two features:

X = [[2, 3],
[4, 5],
[6, 7]]

X_bias would be:

X_bias = [[1, 2, 3],
[1, 4, 5],
[1, 6, 7]]
"""
X_bias = np.c_[np.ones((X.shape[0], 1)), X]

Formula:

&nbsp;

θ = (X<sup>T</sup>X)<sup>-1</sup>X<sup>T</sup>y

&nbsp;

Breakdown:

&nbsp;

- θ is the vector of model parameters that minimize the cost function.

&nbsp;

- X is the matrix of feature values, with each row representing an example and each column a feature. In our context, X is the matrix X_bias, which includes a column of ones to account for the bias term (also called the intercept term) in the linear regression model.

&nbsp;

- X<sup>T</sup> is the transpose of matrix X. Transposing a matrix means to interchange its rows into columns (and vice versa).

&nbsp;

- y is the vector of target values, with each element corresponding to an example.

&nbsp;

- (X<sup>T</sup>X)<sup>-1</sup> is the inverse of the matrix product X<sup>T</sup>X. Inversing a matrix means to find another matrix that, when multiplied with the original matrix, gives the identity matrix.

&nbsp;

Please note that the normal equation requires the inversion of a matrix. In case the matrix X<sup>T</sup>X is not invertible (singular), or nearly so, the computation could lead to large numerical errors. Such scenarios can occur if features are redundant (multicollinearity) or too many features are present (more than training instances).



In [None]:
#.T : Transpose. Changes 2x3 to 3x2
#linalg.inv: Inverse of a matrix. (Can use Row-echelon form for small values and check by hand)
theta = np.linalg.inv(X_bias.T.dot(X_bias)).dot(X_bias.T).dot(y)

In [10]:
theta

array([-145.77332096,    3.95387466])

In [13]:
#predicting for height 70

y_pred = theta[0] + theta[1]*70

y_pred

130.9979055236534