Linear regression

What is it

A linear regression models the relationship between a dependent variable and one (simple linear regression) or more (multiple linear regression) independent variables using a linear predictor, that is, the assumption is that the relationship between them is linear.
For the code here, you need a few imports:
1
import pandas as pd
2
from sklearn.linear_model import LinearRegression
3
from matplotlib import pyplot as plt
Copied!

Simple

In the simple one-dimensional case, we are modelling the dependency as
y=α+βx ,y = \alpha + \beta x \ ,
α\alpha
(the slope of the line) and
β\beta
(the intercept) being the coefficients we want to compute. What we mean by this is that in reality we assume
y=α+βx+ϵ ,y = \alpha + \beta x + \epsilon \ ,
expecting the error
ϵ\epsilon
to be "small".

Multiple

In the case of a multiple linear regression, we would have the line (let's say we have
pp
variables, that is, features):
y=w0+wx ,y = w_0 + \mathbf{w} \cdot \mathbf{x} \ ,
where
w\mathbf{w}
is the vector of parameters
w=[w1,w2,,wp] ,\mathbf w = [w_1, w_2, \ldots, w_p] \ ,
and
x\mathbf{x}
the features
x=[x1x2xp] .\mathbf x = \begin{bmatrix} x_1\\ x_2\\ \ldots\\ x_p \end{bmatrix} \ .
For convenience, we can write the model as
y=wx ,y = \mathbf w \cdot \mathbf x \ ,
where we have set
x0=1x_0 = 1
.
Because we would have several (let's say
nn
) observations (sample data points), each
xjx_j
and each
yjy_j
, where
j1,,pj \in {1, \ldots ,p}
, is a vector in
Rn\mathbb R^n
, so we will denote the
jj
-th feature of the
ii
-th sample by
xijx_i^j
, the
jj
-th coefficient by
wjw_j
and the target variable of the
ii
-th sample by
yiy_i
.

Estimators: Ordinary Least Squares (OLS)

The problem is that of estimating the parameters which suit the assumption of the model. There are several methods to do that; OLS is the most commonly used method and indeed the simplest one.
The cost function of an OLS is given by the sum of the squared residuals between the vector of the real dependent variables and the model predictions:
E(w)=i=1i=n(yiwixi)2E(\mathbf w) = \sum_{i=1}^{i=n} (y_i - \mathbf w_i \cdot \mathbf x_i)^2
(the vector operations are in the features space). In extended form, the cost function is
E(w)=i=1i=n(yij=1j=pwjxij)2 ,E(\mathbf w) = \sum_{i=1}^{i=n} \Big(y_i - \sum_{j=1}^{j=p} w_j x_i^j\Big)^2 \ ,
or, in a short form,
E(w)=ywx2E(\mathbf w) = ||y - \mathbf w \cdot \mathbf x||^2
This function has to be minimised over the parameters, so the becomes solving
minwE(w)\min_{\mathbf w} E(\mathbf w)
which can be tackled via Gradient Descent (see page).
If for the sake of simplicity we put ourselves in just one dimension (one feature, so that
xx
is a single variable), we would have
E(α,β)=i=0i=n(yi(αxi+β))2E(\alpha, \beta) = \sum_{i=0}^{i=n} (y_i - (\alpha x_i + \beta))^2
so we'd have to solve the problem
minα,βE(α,β)\min_{\alpha, \beta} E(\alpha, \beta)
which, by the Gradient Descent method translates into solving the system
{Eα=2i=0i=n(yi(αxi+β))(xi)Eβ=2i=0i=n(αxi+βyi)\begin{cases} \frac{\partial E}{\partial \alpha} = 2 \sum_{i=0}^{i=n} (y_i - (\alpha x_i + \beta))(-x_i) \\ \frac{\partial E}{\partial \beta} = 2 \sum_{i=0}^{i=n} (\alpha x_i + \beta - y_i) \end{cases}

An example

We will use a classic dataset, head size and brain weight, which you can find here. Download the file, put it in the same folder as your code and import it with Pandas:
1
2
df = pd.read_csv('head_size_brain_weight.csv')
Copied!
Let's then run a linear regression (using the routine in sklearn and trying to predict the brain weight given the head size), plotting the resulting line and giving the fitted parameters:
1
# Num samples
2
n = df.count()['Head_size(cm^3)']
3
4
# Invoking the regressor (fit the intercept as well)
5
lr = LinearRegression(fit_intercept=True)
6
7
# Getting x as head size columns and y as brain weight column
8
# Reshaping x from (num_rows,) to (num_rows,1) for the regressor fit to work
9
# (needed when using only one feature as fit method expects a matrix)
10
x = df['Head_size(cm^3)'].as_matrix().reshape(n, 1)
11
y = df['Brain_weight(g)'].as_matrix()
12
13
# Fit the model
14
fit = lr.fit(x, y)
15
16
# Plot the data and the fitting line
17
# Change the label index in the header_index key
18
plt.scatter(x, y, color='black');
19
plt.plot(x, fit.predict(x), color='blue')
20
plt.xlabel('Head size (cm^3)')
21
plt.ylabel('Brain Weight (g)')
22
plt.show();
23
24
# Display the fitted slope and intercept of the fitting line
25
print('Slope of the fit: ', fit.coef_)
26
print('Intercept of the fit: ', fit.intercept_)
Copied!
Fitted parameters turn out to be 0.26 for the slope and 325.5 for the intercept, and this is the resulting line:
Fitting a linear regression for the head size and brain weight dataset.

References

  1. 1.
    Notes on linear regression from the Stanford ML course by A Ng
  2. 2.
    The head size and brain weight dataset, data from R J Gladstone, A study of the brain to the size of the head, Biometrika, 4, 105-123 (1905)
Last modified 7mo ago