After building a machine learning model whether that is using simple linear regression or gradient boosting it is important to get an idea of just how well your model performs.
This article is designed to give you an overview of some of the most common model evaluation methods for regression models along with their advantages and disadvantages.
Regression Problems

For regression problems, where we are building a model to predict numbers such as a house price or temperature, we can use the following metrics to evaluate our model:
Mean Squared Error (MSE)
The mean squared error takes the sum of squared distances between the model’s predicted values and actual values and divides this sum by the number of test examples.
The MSE takes the following formula:

Where:

To demonstrate how the mean squared errors is calculated:


Since we are squaring each error, errors which are larger are penalised more than smaller errors. The mean squared error is influenced heavily by large errors (outliers).
We square the errors, so negative errors become positive and do not decrease positive errors.
A lower value of MSE shows better model performance.
Advantage
- The mean squared error is a differentiable function and can be used as a loss function which can be minimised.
Disadvantages
- The error can be quite difficult to interpret, what value of MSE is considered as acceptable?
- Effected by outliers
Root Mean Squared Error (RMSE)
The root mean squared error is simply the square root of the mean squared error:

Since we are still squaring errors, large errors have a big effect on the final RMSE.
Advantages
- Differentiable — can be used as a loss function to be minimised.
- Shares the same unit as y (if our predicted values are in the 100’s, RMSE would also be in the hundreds) — can be easier to interpret than MSE.
Disadvantage
- As with MSE, RMSE is affected by outliers which may show worse model performance than if the model was fitted on data excluding outliers.
Mean Absolute Error (MAE)

The absolute value || turns negative errors into positive ones.
For example: |-4| = 4
This is done so that negative errors do not decrease positive errors when summing.
Advantage
- We do not square the errors — not effected heavily by outliers.
Disadvantage
- The MAE is not always differentiable and therefore, in some cases, can not be used as a loss function in regression models.
R-Squared Score (R2)
The R2 score is given by the following formula:

Where:

We can imagine that a simple model that always predicts the mean value of the target variable y, that is ŷᵢ = ȳ, we would get R² = 1–1 = 0. Which shows poor model.
R² can even take a negative number — showing a model that performs worse than the simple model that predict the mean of y for each observation yᵢ .
A model with no errors that is:

We would a get an R² score of R² = 1–0 = 1 showing a model that predicts all values of y correctly.
R² score takes a value of 1 (best model performance) to a negative number which indicates poor model performance.
Advantages
- Can to compare models more easily than MSE and RMSE since MSE and RMSE can vary massively when different scales are used.
Disadvantage
- We can overfit our data by including many variables in our model such as polynomial variables. Models which overfit data are not penalized and would have a very high R² score. For example:

Would have a R² score very close to or equal 1 but clearly overfits our data.
Adjusted R-Squared Score

Where:

The purpose of the adjusted R² score is to penalize a model that includes too many independent variables that are not significant in predicting the outcome y (dependent variable). We can imagine as k increases the fraction value increases and the adjusted R² score decreases.
Advantage
- Has a degree over R² score since the adjusted R² score penalizes the addition of too many variable in the model. Can help indicate if a model is is overfitting (see Episode 5) with a low adjusted R² score.
Disadvantage
- Models with bias can still have a relatively high R² score.
Choosing a metric to evaluate your regression model
- Split your data into training and test data
- Use R² score or adjusted R² score to evaluate your model’s performance on the training data.
- Evaluate the model’s performance on the test data. If you see a large difference in performance this may be due to overfitting. If your model performs poorly on the test data, try adding more variables to your model.
If you have any questions please leave them below!
