296 Chapter 5 Classification: Alternative Techniques

Table 5.6. A confusion matrix for a binary classification problem in which the classes are not equally imoortant.

Predicted Class

-f

Actual

Class

+ /++ (TP) /+- (FN)

/-+ (FP) /– (rN)

denoted as the negative class. A confusion matrix that summarizes the number of instances predicted correctly or incorrectly by a classification model is shown in Table 5.6.

The following terminology is often used when referring to the counts tab- ulated in a confusion matrix:

o TYue positive (TP) or fi1, which corresponds to the number of positive

examples correctly predicted by the classification model.

o False negative (FN) or fi-, which corresponds to the number of positive

examples wrongly predicted as negative by the classification model.

o False positive (FP) or /-1, which corresponds to the number of negative examples wrongly predicted as positive by the classification model.

o TYue negative (TN) or /–, which corresponds to the number of negative examples correctly predicted by the classification model.

The counts in a confusion matrix can also be expressed in terms of percentages.

The true positive rate (TPR) or sensitivity is defined as the fraction of positive examples predicted correctly by the model, i.e.,

TPR:TPIQP+r ‘N) .

Similarly, the true negative rate (?l/R) or specificity is defined as the fraction of negative examples predicted correctly by the model, i.e.,

TNR: TN/ (TN + Fp) .

Finally, the false positive rate (FPR) is the fraction of negative examples predicted as a positive class, i.e.,

FPR: FPIQN + FP),

5.7 Class Imbalance Problem 297

while the false negative rate (F N R) is the fraction of positive examples predicted as a negative class, i.e.,

FNR: FNI(TP + -F’N).

Recall and precision are two widely used metrics employed in applica- tions where successful detection of one of the classes is considered more signif- icant than detection of the other classes. A formal definition of these met,rics is given below.

Precision, p: TP

TP+FP TP

Recall, r :

(5 .74)

(5.75) TP+FN

Precision determines the fraction of records that actually turns out to be positive in the group the classifier has declared as a positive class. The higher the precision is, the lower the number of false positive errors committed by the classifier. Recall measures the fraction of positive examples correctly predicted by the classifier. Classifiers with large recall have very few positive examples misclassified as the negative class. In fact, the value of recall is equivalent to the true positive rate.

It is often possible to construct baseline models that maximize one metric but not the other. For example, a model that declares every record to be the positive class will have a perfect recall, but very poor precision. Conversely, a model that assigns a positive class to every test record that matches one of the positive records in the training set has very high precision, but low rer:all. Building a model that maximizes both precision and recall is the key challenge of classification algorithms.

Precision and recall can be summarized into another metric known as the Fr meaSure.

(5.76)

In principle, .F’1 represents a harmonic mean between recall and precision, i.e.,

The harmonic mean of two numbers z and gr tends to be closer to the smaller of the two numbers. Hence, a high value of F1-measure ensures that both

2rp 2xTP – r+p 2xTP+FP+f ‘ l ‘ r

, a l –

1 1 . I I I

i – p

298 Chapter 5 Classification: Alternative Techniques

precision and recall are reasonably high. A comparison among harmonic, ge-

ometric, and arithmetic means is given in the next example.

Example 5.8. Consider two positive numbers a,: 1 and b: 5. Their arith-

met ic mean is L ro : (a+b)12:3 and the i r geomet r ic mean is Fg : \ /ob : 2.236. Their harmonic mean is p,h : (2xlx5) 16 : L.667, which is closer to the

smaller value between o and b than the arithmetic and geometric means. r

More generally, the FB measure can be used to examine the tradeoff be-

tween recall and precision:

/ ^ t a \

D \lt- + tlrp , ^ qr+p -p

(P ‘+r ) xTP (5.77)

@2+t) rp+p2FP+Fr \ /

Both precision and recall are special cases of FB by setting 0 :0 and B : 66,

respectively. Low values of B make Fp closer to precision, and high values make it closer to recall.

A more general metric that captures .F-B as well as accuracy is the weighted accuracy measure, which is defined by the following equation:

Weighted &ccltro,c/: wtTP -f utTN

(5.78) utTP * utzF P + unF N + u)4T N’

The relationship between weighted accuracy and other performance metrics is

summarized in the following table:

Measure lal 1.D2 wJ w4

Recall Precision FB Accuracy

1 1

92 + t 1

I 0 13′ 1

0 1 1 1

0 0 0 1 I

5.7.2 The Receiver Operating Characteristic Curve

A receiver operating characteristic (ROC) curve is a graphical approach for

displaying the tradeoll between true positive rate and false positive rate of a

classifier. In an ROC curve, the true positive rate (TPR) is plotted along the g axis and the false positive rate (FPR) is shown on the r axis. Each point

along the curve corresponds to one of the models induced by the classifier. Figure 5.41 shows the ROC curves for a pair of classifiers, M1 and M2.

Mz. ‘

4

M1

[ , / I

,/, ,, ,/ .’

l ‘

5.7 Class Imbalance Problem 299

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 False Positive Rate

Figure 5.41. ROC curves for two different classifiers.

There are several critical points along an ROC curve that have well-known interpretations:

(TPR:O, FPR:0): Model predicts every instance to be a negative class. (TPR:l, FPR:I): Model predicts every instance to be a positive class. (TPR:l, FPR:O): The ideal model.

A good classification model should be located as close as possible to the up- per left corner of the diagram, while a model that makes random guesses should reside along the main diagonal, connecting the points (TPR:0,FPR:0) and (TPR: I,FPR:1). Random guessing means that a record is classi- fied as a positive class with a fixed probability p, irrespective of its attribute set. For example, consider a data set that contains na positive instances and n- negative instances. The random classifier is expected to correctly classify pna of the positive instances and to misclassify pn- of the negative instances. Therefore, the TPR of the classifier is (pn)lnt : pt while its FPRis (pn,)/p – p. Since theTPR and FPR are identical, the ROC curve for a random classifier always reside along the main diagonal.

An ROC curve is useful for comparing the relative performance among different classifiers. In Figure 5.4I, Ml is better than M2 when f’PE is less

o (d

E c)

o o(L c) 5

0.9

o.7

u.o

u.c

0.4

0.1

0

300 Chapter 5 Classification: Alternative Techniques

than 0.36, while Mz is superior when FPIB is greater than 0.36. Clearly, neither of these two classifiers dominates the other.

The area under the ROC curve (AUC) provides another approach for eval-

uating which model is better on average. If the model is perfect, then its area

under the ROC curve would equal 1. If the model simply performs random guessing, then its area under the ROC curve would equal 0.5. A model that

is strictly better than another would have a larger area under the ROC curve.

Generating an ROC curve

To draw an ROC curve, the classifier should be able to produce a continuous- valued output that can be used to rank its predictions, from the most likely

record to be classified as a positive class to the least likely record. These out- puts may correspond to the posterior probabilities generated by a Bayesian classifier or the numeric-valued outputs produced by an artificial neural net-

work. The following procedure can then be used to generate an ROC curve:

Assuming that the continuous-valued outputs are defined for the positive

class, sort the test records in increasing order of their output values.

Select the lowest ranked test record (i.e., the record with lowest output value). Assign the selected record and those ranked above it to the positive class. This approach is equivalent to classifying all the test records as positive class. Because all the positive examples are classified correctly and the negative examples are misclassified, TPR: FPR: I.

Select the next test record from the sorted list. Classify the selected record and those ranked above it as positive, while those ranked below it

as negative. Update the counts of TP and FP by examining the actual class label of the previously selected record. If the previously selected record is a positive class, the TP count is decremented and the FP count remains the same as before. If the previously selected record is a negative class, the FP count is decremented and TP cotnt remains the same as before.

Repeat Step 3 and update theTP and FP counts accordingly until the highest ranked test record is selected.

PIot the TPR against FPR of the classifier.

Figure 5.42 shows an example of how to compute the ROC curve. There are five positive examples and five negative examples in the test set. The class

1 .

2 .

3 .

4.

5.

5 4 4 o J 3 2 2 1 0

5 c 4 4 2 ‘I 0 0 0

TN 0 0 1 1 J 4 4 5 5

FN 0 1 1 2 2 2 3 3 4 5

TPR I 0.8 0.8 u.o 0.6 0.6 0.6 o.4 o.4 0.2 0

FPR ‘I 1 0.8 0.8 0 6 o.4 o.2 o.2 0 0 0

5.7 Class Imbalance Problem 301-