How to Evaluate Your Classification Algorithm in Python

In the previous episode, we went over the different methods you can evaluate your classification algorithm.

In this episode, we focus on applying these methods in Python.

You can view and use the code and data used in this episode here: Link

1. Building the Classification Algorithm

First we need to build a classification algorithm to evaluate. In this case, we are going to be evaluating the non-linear support vector machine built in episode 9.3.

You can copy and paste the following code into your Python script.

Ensure you save the data and change the file path to where your data is saved.

import warnings
warnings.filterwarnings("ignore")

import pandas as pd

# read data 
star_data = pd.read_csv("D:\ProjectData\pulsar_data.csv")

# drop rows containing missing values 
star_data.dropna(inplace = True)

# remove spaces in column headings 
star_data.columns = star_data.columns.str.strip()

# define input (X) and output (y) data of the algorithm 
X = star_data.drop('target_class', 1) 
y = star_data['target_class']

# perform data standardization 
from sklearn.preprocessing import StandardScaler

s_scaler = StandardScaler()

X_ss = pd.DataFrame(s_scaler.fit_transform(X), columns = X.columns)

# split the data into a training and test set 
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_ss, y, test_size=0.25, random_state=42)

# build the support vector machine 
from sklearn import svm

clf_rbf = svm.SVC(kernel = 'rbf', C = 10)

clf_rbf.fit(X_train, y_train)

# obtain a set of predictions 
y_pred = clf_rbf.predict(X_test)

2. Evaluating the Classification Algorithm

Now that we have built our classification algorithm we can check our algorithm’s performance under different metrics. An explanation for each metric can be found in episode 10.3.

We can obtain the true positives, false positives, true negatives and false negatives of our model using confusion_matrix from sklearn.

# obtain true positives, false positives, true negatives and false negatives
from sklearn.metrics import confusion_matrix

CM = confusion_matrix(y_test, y_pred)

TP = CM[1][1] 
FP = CM[0][1] 
TN = CM[0][0] 
FN = CM[1][0]

These are useful late for calculating some evaluation metrics.

Accuracy

from sklearn.metrics import accuracy_score

accuracy_score(y_test, y_pred)

Precision

from sklearn.metrics import precision_score

precision_score(y_test, y_pred)

Recall

from sklearn.metrics import recall_score

recall_score(y_test, y_pred)

False Positive Rate

False_Positive_Rate = FP/(FP + TN)

False_Positive_Rate

Specificity

Specificity = TN/(TN + FP)

Specificity

Sensitivity

Sensitivity= TP/(TP + FN)

Sensitivity

F1 Score

from sklearn.metrics import f1_score

f1_score(y_test, y_pred)

AUROC Score

To plot our ROC curve we can use the following code. We also obtain our AUROC value rounded to 2dp.

from sklearn import metrics

metrics.plot_roc_curve(clf_rbf, X_test, y_test)

To obtain the AUROC value:

from sklearn.metrics import roc_auc_score

roc_auc_score(y_test, clf_rbf.decision_function(X_test))

Leave a comment

Design a site like this with WordPress.com
Get started