Multiple Linear Regression in Python

Code and data used can be found here: Repository 

This episode expands on Implementing Simple Linear Regression In Python. We extend our simple linear regression model to include more variables.

Setting up your programming environment can be found in the first section of Ep 4.3.

Importing our Data

The first step is to import our data into python.

We can do that by going on the following link: Data

Click on “code” and download ZIP.

Locate WeatherDataM.csv and copy it into your local disc under a new file ProjectData

Note: Keep this medium post on a split screen so you can read and implement the code yourself.

Now we are ready to import our data into our Notebook:

# Import Pandas Library, used for data manipulation
# Import matplotlib, used to plot our data
# Import nump for mathemtical operations

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Import our WeatherDataM and store it in the variable weather_data_m
weather_data_m = pd.read_csv("D:\ProjectData\WeatherDataM.csv") 
# Display the data in the notebook
weather_data_m

Here we can see a table with all the variables we will be working with.

Plotting our Data

Each of our inputs X (Temperature, Wind Speed and Pressure) must form a linear relationship with our output y (Humidity) in order for our multiple linear regression model to be accurate.

Let’s plot our variables to confirm this.

Here we follow common Data Science convention, naming our inputs X and output y.

# Set the features of our model, these are our potential inputs
weather_features = ['Temperature (C)', 'Wind Speed (km/h)', 'Pressure (millibars)']

# Set the variable X to be all our input columns: Temperature, Wind Speed and Pressure
X = weather_data_m[weather_features]

# set y to be our output column: Humidity
y = weather_data_m.Humidity

# plt.subplot enables us to plot mutliple graphs
# we produce scatter plots for Humidity against each of our input variables

plt.subplot(2,2,1)
plt.scatter(X['Temperature (C)'],y)
plt.subplot(2,2,2)
plt.scatter(X['Wind Speed (km/h)'],y)
plt.subplot(2,2,3)
plt.scatter(X['Pressure (millibars)'],y)
  • Humidity against Temperature forms a strong linear relationship
  • Humidity against Wind Speed forms a linear relationship
  • Humidity against Pressure forms no linear relationship

Pressure can not be used in our model and is removed with the following code

X = X.drop("Pressure (millibars)", 1)

We specify the the column name went want to drop: Pressure (millibars)

1 represents our axis number: 1 is used for columns and 0 for rows.

Because we are working with just two input variables we can produce a 3D scatter plot of Humidity against Temperature and Wind speed.

With more variables this would not be possible, as this would require a 4D + plot which we as humans can not visualise.

# Import library to produce a 3D plot
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

x1 = X["Temperature (C)"]
x2 = X["Wind Speed (km/h)"]

ax.scatter(x1, x2, y, c='r', marker='o')

# Set axis labels
ax.set_xlabel('Temperature (C)')
ax.set_ylabel('Wind Speed (km/h)')
ax.set_zlabel('Humidity')

Implementing Multiple Linear Regression

In order to calculate our Model we need to import the LinearRegression model from Sci-kit learn library. This function enables us to calculate the parameters for our model (θ₀, θ₁ and θ₂) with one line of code.

from sklearn.linear_model import LinearRegression

# Define the variable mlr_model as our linear regression model
mlr_model = LinearRegression()
mlr_model.fit(X, y)

We can then display the values for θ₀, θ₁ and θ₂:

θ₀ is the intercept

θ₁ and θ₂ are what we call co-efficients of the model as the come before our X variables.

theta0 = mlr_model.intercept_
theta1, theta2 = mlr_model.coef_

theta0, theta1, theta2

Giving our multiple linear regression model as:

ŷ = 1.14–0.031𝑥¹- 0.004𝑥²

Using our Regression Model to make predictions

Now we have calculated our Model, it’s time to make predictions for Humidity given a Temperature and Wind speed value:

y_pred = mlr_model.predict([[15, 21]])
y_pred

So a temperature of 15 °C and Wind speed of 21 km/h expects to give us a Humidity of 0.587.

Side note:

We reshaped all of our inputs into 2D arrays by using double square brackets
( [[]] ) which is a much more efficient method.

If you have any questions please leave them below!

Leave a comment

Design a site like this with WordPress.com
Get started