Fitting the Simple Linear Model
15.3. Fitting the Simple Linear Model#
We stated earlier in this chapter that when we minimize the average loss over the data:
the best-fitting line has intercept and slope:
In this section, we use calculus to derive these results.
With the simple linear model, the mean squared error is a function of two model parameters, the intercept and slope. This means that if we use calculus to find the minimizing parameter values, we need to find the partial derivatives of the MSE with respect to \(\theta_0\) and \(\theta_1\). We can also find these minimizing values through other techniques:
- Gradient descent
We can use numerical optimization techniques, such as gradient descent, when the loss function is more complex and it’s faster to find an approximate solution that’s pretty accurate (see Chapter 20).
- Quadratic formula
Since the average loss is a quadratic function of \( \theta_0\) and \( \theta_1 \), we can use the quadratic formula (along with some algebra) to solve for the minimizing parameter values.
- Geometric argument
Later in this chapter, we use a geometric interpretation of least squares to fit multiple linear models. This approach relates to the Pythagorean theorem and has several intuitive benefits.
We choose calculus to optimize the simple linear model since it is quick and straightforward. To begin, we take the partial derivatives of the sum of squared errors with respect to each parameter (we can ignore the e\(1/n\) in the MSE because it doesn’t affect the location of the minimum):
Then we set the partial derivatives equal to 0 and simplify a bit by multiplying both sides of the equations by \(-1/2\) to get:
These equations are called the normal equations. In the first equation, we see that \(\hat{\theta}_0\) can be represented as a function of \(\hat{\theta}_1\):
Plugging this value into the second equation gives us:
After some algebra, we can represent \(\hat{\theta}_1\) in terms of quantities that we are familiar with:
As shown earlier in this chapter, this representation says that a point on the fitted line at \(x\) can be written as follows:
We have derived the equation for the least squares line that we used in the previous section. There, we used the pandas
built-in methods to compute
\(SD(\mathbf{x})\), \(SD(\mathbf{y})\), and \(r(\mathbf{x}, \mathbf{y})\),
to easily calculate the equation for this line.
However, in practice we recommend using the functionality provided in scikit-learn
to do the model fitting:
from sklearn.linear_model import LinearRegression
y = GA['pm25pa']
x = GA[['pm25aqs']]
reg = LinearRegression().fit(x, y)
Our fitted model is:
print(f"Model: PA estimate = {reg.intercept_:.2f} + {reg.coef_[0]:.2f}AQS")
Model: PA estimate = -3.36 + 2.10AQS
Notice that we provided y
as an array and x
as a data frame to LinearRegression
. We will soon see why when we fit multiple explanatory features in a model.
The LinearRegression
method offers numerically stable algorithms to fit linear models by least squares. This is especially important when fitting multiple variables, which we introduce next.