In the previous episode, we went over the different methods you can evaluate your classification algorithm.
In this episode, we focus on applying these methods in Python.
You can view and use the code and data used in this episode here: Link
1. Building the Classification Algorithm
First we need to build a classification algorithm to evaluate. In this case, we are going to be evaluating the non-linear support vector machine built in episode 9.3.
You can copy and paste the following code into your Python script.
Ensure you save the data and change the file path to where your data is saved.
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
# read data
star_data = pd.read_csv("D:\ProjectData\pulsar_data.csv")
# drop rows containing missing values
star_data.dropna(inplace = True)
# remove spaces in column headings
star_data.columns = star_data.columns.str.strip()
# define input (X) and output (y) data of the algorithm
X = star_data.drop('target_class', 1)
y = star_data['target_class']
# perform data standardization
from sklearn.preprocessing import StandardScaler
s_scaler = StandardScaler()
X_ss = pd.DataFrame(s_scaler.fit_transform(X), columns = X.columns)
# split the data into a training and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_ss, y, test_size=0.25, random_state=42)
# build the support vector machine
from sklearn import svm
clf_rbf = svm.SVC(kernel = 'rbf', C = 10)
clf_rbf.fit(X_train, y_train)
# obtain a set of predictions
y_pred = clf_rbf.predict(X_test)
2. Evaluating the Classification Algorithm
Now that we have built our classification algorithm we can check our algorithm’s performance under different metrics. An explanation for each metric can be found in episode 10.3.
We can obtain the true positives, false positives, true negatives and false negatives of our model using confusion_matrix from sklearn.
# obtain true positives, false positives, true negatives and false negatives from sklearn.metrics import confusion_matrix CM = confusion_matrix(y_test, y_pred) TP = CM[1][1] FP = CM[0][1] TN = CM[0][0] FN = CM[1][0]
These are useful late for calculating some evaluation metrics.
Accuracy
from sklearn.metrics import accuracy_score accuracy_score(y_test, y_pred)

Precision
from sklearn.metrics import precision_score precision_score(y_test, y_pred)

Recall
from sklearn.metrics import recall_score recall_score(y_test, y_pred)

False Positive Rate
False_Positive_Rate = FP/(FP + TN) False_Positive_Rate

Specificity
Specificity = TN/(TN + FP) Specificity

Sensitivity
Sensitivity= TP/(TP + FN) Sensitivity

F1 Score
from sklearn.metrics import f1_score f1_score(y_test, y_pred)

AUROC Score
To plot our ROC curve we can use the following code. We also obtain our AUROC value rounded to 2dp.
from sklearn import metrics metrics.plot_roc_curve(clf_rbf, X_test, y_test)

To obtain the AUROC value:
from sklearn.metrics import roc_auc_score roc_auc_score(y_test, clf_rbf.decision_function(X_test))

