Logistic Regression can be thought of as an extension of Linear Regression. With Linear Regression our final output for our model took a single value, however, with logistic regression, we apply an extra function to Linear Regression that puts our final value output into a group i.e. 1 or 0.
What is Logistic Regression?
Logistic regression is a very common supervised machine learning algorithm (see Episode 3) used by Data Scientists to categorize data into groups.
Overview
The job of logistic regression is to take a bunch of input data and organise the data into different groups. For example take a look at the following table of weather data gathered from Albury, Australia:

- Our objective is to predict whether it will rain tomorrow in Albury.
- Our model, therefore, has two group outputs: 0 – No or 1 – Yes.
- Since our model has group outputs we use logistic regression to achieve our objective.
We build a logistic regression model that:
- Calculates a value from the weather data using linear regression:

𝑥₁,𝑥₂, …, 𝑥₈ are our weather inputs: Min temp, Maxtemp, …, RainToday
θ₀, θ₁,θ₂, …., θ₈ are set to random values as with linear regression.
2. Converts this value (z) into a probability between 0 and 1 of it raining tomorrow (using the logistic function):

3. Places this probability into a group.
For example:
If this probability is ≥50% then our model should predict yes (1).
If this probability is <50% our model should predict no (0). This is called a boundary condition.

Our Model Takes the Following Form:

This is essentially, a very simple neural network, we will be covering neural networks in much more detail in a future episode.
The Logistic Function
The logistic function is given by the following formula:

When plotting the logistic function for -6< z <6 our graph looks like this.

Essentially, the logistic function takes any value (z) and puts it on a scale between 0 and 1. This is useful since it can convert values outputted from our linear regression model (step 1) into a probability.
We can then implement a decision boundary:

To put the probability calculated by our logistic function into a group.
i.e.
If the probability is greater than 0.5 output 1 (yes)
If the probability is less than 0.5 output 0 (no)
The Cost Function
I will not go into detail as how the cost function is calculated as this is not necessary to know, however I will try explain the logic behind the logistic regression cost function:
Just as with linear regression, logistic regression also has cost function. This cost function is used to calculate our model’s parameters θ₀, θ₁,θ₂, …., θ₈ by gradient descent.
The cost function indicates how far away our model’s predicted values hθ(𝑥) are away from our actual values y. With linear regression, we used mean squared error for this.
Note the first two steps of our model can be combined into a single function:

Since we have two outputs 0 or 1 we need a cost function that can record the model error for each. We have the two cost functions:


Why do These Cost Functions Work?
For our first function where our actual output y= 1, if our model predicts a value far from this, i.e. 0.01, we should have a high cost.
Let’s test:
log(0.01) = 2 is high ✓ reflecting bad model performance (high error).
Likewise of our model predicts a value close to 1, i.e 0.99 we should have a low cost.
Let’s test:
log(0.99) = 0.004 is low ✓ reflecting good model performance (low error).
The same applies for y = 0
Combine the Cost Functions
We can do a sneaky maths trick to combine both cost functions into one:

This works since if y = 1 we get:

as before, and if y= 0 we get:

also as before.
Lastly we we iterate our cost function over all data observations to get an overall cost for our model.
We then divide this cost by our number of training examples m to obtain an average cost:

We can then apply gradient descent on our final cost function J(θ) to calculate our parameters θ₀, θ₁,θ₂, …. , θ₈ for our model.
Evaluating Logistic Regression Models
We can evaluate a logistic regression model using what’s called a confusion matrix shown below:

Here we have used the confusion matrix on 100 data observations.
The accuracy of our model is given by:

We use this evaluation metric to evaluate and make improvements to our model.
Summary
- Logistic Regression is used to categorize data into groups
- It works similar to linear regression but contains:
- The Logistic function to convert values into probabilities
- A Boundary condition to place probabilities into groups (0 or 1)
And there you have it. Thanks for reading.
