Traditional linear regression (a.k.a. Ordinary Least Squares) is the simplest and classic form of regression.
Given a linear model in the form of:
\begin{align} f(\mathbf{x}) & = w_0 + w_1x_1 + w_2x_2 + \dots \\ & = \mathbf{w} ^T \cdot \mathbf{x} \end{align}
Linear regression finds the parameters \(\mathbf{w}\) that minimises the mean squared error (MSE)…
The MSE is the sum of the squared values between the predicted value and the actual value.
\begin{align} MSE(\mathbf{w}) = \frac{1}{m} \sum_{i=1}^{m}{(\mathbf{w}^T \cdot \mathbf{x}^{(i)} - \mathbf{y}^{(i)})^2} \end{align}
Where \(m\) is the number of observations.
We can also use a measure of how good a fit is (utility or fitness function).
???
The typical use for a cost function is to measure how badly a model fits or classifies the data. The model is then tweaked to minimise the cost function. E.g. For linear regression we often use the distance between the predictions from the model and the real data; the MSE in the previous example.
There is a closed solution to find the optimal MSE for a given data. It’s called the Normal equation.
\begin{align} \hat{w}=\left(\mathbf{x}^T\cdot\mathbf{x}\right)^{-1} \cdot \mathbf{x}^T \cdot \mathbf{y} \end{align}
Let’s generate some synthetic data and run through this manually.
Most of the time you will use a library, but sometimes you need to dig deeper.
???
Most standard models and optimisers are already implemented. During your day-to-day work you will use these. But it aids understanding and occasionally you will need custom implementations.
???
Other linear regression implementations exist. Ridge regression attempts to optimise the same parameters but penalises the squared weights.
Effectively it attempts to minimise the use of features. This is called regularisation.
Ridge regression is most useful when there are large numbers of features.
Another type of regression is called Lasso. This is similar to Ridge regression but penalises L1 (absolute) magnitudes of the weights.
We can see that the penalty is too aggressive. We can alter the alpha
parameter to reduce the
the aggressiveness of the penalisation (same for Ridge).
Generally, Ridge regression is simpler to use but assumes Gaussian noise. Lasso performs better when there are outliers (due to the L1 penalisation, rather than L2).
There is one final common form of regularisation and that is called Elastic Net. In essence, it is a combination of both ridge and lasso regression, where you can control the level of influence from each.
We can see that the effect is quite pronounced.
RANSAC randomly picks a selection of observations from the sample and then performs regression as normal.
The most common set of parameters are chosen as the final result.
This algorithm works really well when there are outliers in the data.