Code and data used can be found here: Repository
This episode expands on Implementing Simple Linear Regression In Python. We extend our simple linear regression model to include more variables.
Setting up your programming environment can be found in the first section of Ep 4.3.
Importing our Data
The first step is to import our data into python.
We can do that by going on the following link: Data
Click on “code” and download ZIP.

Locate WeatherDataM.csv and copy it into your local disc under a new file ProjectData
Note: Keep this medium post on a split screen so you can read and implement the code yourself.
Now we are ready to import our data into our Notebook:
# Import Pandas Library, used for data manipulation
# Import matplotlib, used to plot our data
# Import nump for mathemtical operations
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Import our WeatherDataM and store it in the variable weather_data_m
weather_data_m = pd.read_csv("D:\ProjectData\WeatherDataM.csv")
# Display the data in the notebook
weather_data_m

Here we can see a table with all the variables we will be working with.
Plotting our Data
Each of our inputs X (Temperature, Wind Speed and Pressure) must form a linear relationship with our output y (Humidity) in order for our multiple linear regression model to be accurate.
Let’s plot our variables to confirm this.
Here we follow common Data Science convention, naming our inputs X and output y.
# Set the features of our model, these are our potential inputs weather_features = ['Temperature (C)', 'Wind Speed (km/h)', 'Pressure (millibars)'] # Set the variable X to be all our input columns: Temperature, Wind Speed and Pressure X = weather_data_m[weather_features] # set y to be our output column: Humidity y = weather_data_m.Humidity # plt.subplot enables us to plot mutliple graphs # we produce scatter plots for Humidity against each of our input variables plt.subplot(2,2,1) plt.scatter(X['Temperature (C)'],y) plt.subplot(2,2,2) plt.scatter(X['Wind Speed (km/h)'],y) plt.subplot(2,2,3) plt.scatter(X['Pressure (millibars)'],y)

- Humidity against Temperature forms a strong linear relationship ✓
- Humidity against Wind Speed forms a linear relationship ✓
- Humidity against Pressure forms no linear relationship ✗
Pressure can not be used in our model and is removed with the following code
X = X.drop("Pressure (millibars)", 1)
We specify the the column name went want to drop: Pressure (millibars)
1 represents our axis number: 1 is used for columns and 0 for rows.
Because we are working with just two input variables we can produce a 3D scatter plot of Humidity against Temperature and Wind speed.
With more variables this would not be possible, as this would require a 4D + plot which we as humans can not visualise.
# Import library to produce a 3D plot
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x1 = X["Temperature (C)"]
x2 = X["Wind Speed (km/h)"]
ax.scatter(x1, x2, y, c='r', marker='o')
# Set axis labels
ax.set_xlabel('Temperature (C)')
ax.set_ylabel('Wind Speed (km/h)')
ax.set_zlabel('Humidity')

Implementing Multiple Linear Regression
In order to calculate our Model we need to import the LinearRegression model from Sci-kit learn library. This function enables us to calculate the parameters for our model (θ₀, θ₁ and θ₂) with one line of code.
from sklearn.linear_model import LinearRegression # Define the variable mlr_model as our linear regression model mlr_model = LinearRegression() mlr_model.fit(X, y)
We can then display the values for θ₀, θ₁ and θ₂:
θ₀ is the intercept
θ₁ and θ₂ are what we call co-efficients of the model as the come before our X variables.
theta0 = mlr_model.intercept_ theta1, theta2 = mlr_model.coef_ theta0, theta1, theta2

Giving our multiple linear regression model as:
ŷ = 1.14–0.031𝑥¹- 0.004𝑥²
Using our Regression Model to make predictions
Now we have calculated our Model, it’s time to make predictions for Humidity given a Temperature and Wind speed value:
y_pred = mlr_model.predict([[15, 21]]) y_pred

So a temperature of 15 °C and Wind speed of 21 km/h expects to give us a Humidity of 0.587.
Side note:
We reshaped all of our inputs into 2D arrays by using double square brackets
( [[]] ) which is a much more efficient method.
