Solving Underfitting and Overfitting

Explaining and solving bad models

Underfitting and overfitting are both common problems data scientists come across when evaluating their model. It is important you are aware of these issues and what we can do resolve them.

Definitions

Underfitting: Occurs when our model fails to capture the underlying trend in our data:

Models which underfit our data:

  • Have a Low Variance and a High Bias
  • Tend to have less features [ 𝑥 ]
  • High-Bias: Assumes more about the form or trend our data takes
  • Low Variance: Changes to our data makes small changes to our model’s predicted values

— — — — — — — — — — — — — — — — — — — — —

Overfitting: Occurs when our model captures the underlying trend, however, includes too much noise and fails to capture the general trend:

Models which overfit our data:

  • Have a High Variance and a Low Bias
  • Tend to have many features [𝑥, 𝑥², 𝑥³, 𝑥⁴, …]
  • High Variance: Changes to our data makes large changes to our model’s predicted values.
  • Low Bias: Assumes less about the form or trend our data takes

— — — — — — — — — — — — — — — — — — — — —

A Good fit: Does not overfit or underfit our data and captures the general trend of our data:

Models which fit our data well:

  • Have a Low Variance and Low Bias
  • Tend to have a reasonable number of features
  • Perform well on test data [new data given to the model]

Bias and Variance Trade-off

  • Increasing variance will decrease bias.
  • Increasing bias will decrease variance.

In order to achieve a model that fits our data well, with a low variance and low bias, we need to look at something called the Bias and Variance Trade-off.

We reach our optimal model, with the lowest error rate, when both our bias and variance are relatively low.

When our Model’s Bias is too high:

Symptoms:

  • Very High Mean Squared Error
  • Performs Poorly on our test data [ new data added to the model ]

Solution:

  • Add more features to our model [perhaps include 𝑥³, 𝑥⁴ or new variables 𝑥₂ or 𝑥₃]
  • Reduce the amount of training data [data used to build our model]

Having a high bias is often more problematic than a high variance, since model predictions can become very inaccurate and reducing bias, by increasing variance, often takes more effort.

When our Model’s Variance is too high:

Symptoms:

  • Very low Mean Squared Error
  • Performs Poorly on our test data [ new data added to the model ]

Solution:

  • Reduce the number of features used in the model
  • Apply Regularization
  • Increase the amount of training data

Having a high variance is less problematic than a high bias since we can apply algorithms, such as regularization, that can prevent our model from over-fitting our data.

Regularization

Regularization reduces our model’s parameters [(not θ₀), θ₁, θ₂, θ₃ … ], by adjusting the gradient descent algorithm, which decreases the model’s overall variance.

Example:

  • Reducing θ from 2 to 0.5 reduced our model’s variance
  • Therefore reducing the value of our model’s parameters reduces the model’s variance.

Regularisation reduces our parameter values by making the following adjustment to our gradient descent algorithm:

Changing the value of θ₀ has no effect on our overall variance.

We update the parameters θ₁, θ₂, …, θₙ by:

This is the regularization element of our algorithm which reduces the value of our parameters, thereby decreasing variance.

λ is set by us and is referred to as the complexity parameter or regulazation parameter:

  • Increasing the value of λ further reduces the value of θ
  • This further reduces our model’s variance

m is the number of training examples

Example:

Applying regularisation to the following high variance model:

Setting λ = 10¹⁰ will significantly reduce all of our parameters θ₁, θ₂, …, θₙ to ≈ 0:

Leaving us with the high bias model:

It is important that we do not set our complexity parameter λ to be too high as the will result in a high bias model, and not too low as this will result in a high variance model.

A good value of lambda is unfortunately data dependent, so you have to do some tuning to see what produces a good model that performs well on test data.

  • Try λ = 0.1, 1, 10, 100, 1000 and see what effect this has on your model’s error rate and then narrow down a good value for λ.

Summary

  • Underfitting occurs when our model fails to capture the underlying trend in our data
  • Overfitting occurs when our model includes too much “noise” and fails to capture the general trend
  • A good fit neither underfits nor overfits our data and has a low bias and low variance
  • Regularization reduces the variance of our model by reducing parameter values
If you have any questions please leave them below!

Leave a comment

Design a site like this with WordPress.com
Get started