7.1. Validation of Classifiers¶

Text classification is part of supervised Machine Learning (ML). As depicted in the picture above, supervised ML relies on labeled data - not only for training, but also for model testing. It is important that training- and test-datasets are disjoint.
Once a model is trained, it is applied on test-data and calculates predictions for the test-data input. Since for test-data also the true label (output) is known, these true labels can be compared with the predicted labels. Based on this comparison, different metrics for classifier evaluations can be calculated. The most important classifier metrics are described below.
Assume that for 10 test-data samples the true and predicted labels (class-indeces) are as listed in the table below:
True Label |
Predicted Label |
---|---|
0 |
0 |
0 |
1 |
1 |
1 |
1 |
1 |
1 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
All of the metrics described below, can be calculated from this comparison.
7.1.1. Confusion matrix¶
The confusion matrix contains for each pair of classes \(i\) and \(j\), the number of class \(i\) elements, which have been predicted to be class \(j\). Usually, each row corresponds to a true-class label and each column corresponds to a predicted class label.
In the general 2-class confusion matrix, depicted below, the class labels are \(P\) (positive) and \(N\). The matrix entries are then
TP (True Positives): Number of samples, which belong to class \(P\) and have correctly been predicted to be class \(P\)
TN (True Negative): Number of samples, which belong to class \(N\) and have correctly been predicted to be class \(N\)
FP (False Positives): Number of samples, which belong to class \(N\) but have falsely been predicted to be class \(P\)
FN (False Negatives): Number of samples, which belong to class \(P\) but have falsely been predicted to be class \(N\)

For the given example of predicted and true class labels the confusion matrix is:

7.1.2. Accuracy¶
Accuracy is the ratio of correct predictions among all predictions. For the 2-class problem and the labels \(P\) and \(N\), accuracy can be calculated from the entries of the confusion matrix:
In the example the accuracy is $\( Acc =\frac{5+2}{5+2+1+2}=0.7 \)$
7.1.3. Recall¶
The recall of class \(i\) is the ratio of correctly predicted class \(i\) elements, among all elements, which truly belong to class \(i\). The recall of class \(P\) is:
and for class \(N\):
In the example:
7.1.4. Precision¶
The precision of class \(i\) is the ratio of true class \(i\) elements, among all elements, which have been predicted to be class \(i\). The precision of class \(P\) is:
and for class \(N\):
In the example:
7.1.5. F1-Score¶
The F1-score of a class \(i\) is the harmonic mean of this class’ precision and recall:
In the example: