Confusion Matrix

July 30, 2023

By Admin


Confusion Matrix

A confusion matrix is a performance evaluation tool used in binary classification (a classification problem with two classes) to assess the performance of a machine learning model. It summarizes the results of a classification model by comparing predicted class labels against the actual class labels from a test dataset. The confusion matrix provides valuable insights into the model's accuracy, precision, recall, and other key metrics.

Confusion-Matrix

Consider a binary classification problem with two classes: Positive (P) and Negative (N). The confusion matrix is represented in a tabular form as above figure:

Let's see each term in the confusion matrix with examples:

True Positive (TP): True Positive refers to the number of instances that are correctly predicted as Positive by the model. In other words, TP is the number of positive samples that the model correctly identified.

Example: In a medical diagnosis scenario, TP represents the number of patients with a disease correctly identified as having the disease by the model.

False Negative (FN): False Negative refers to the number of instances that are incorrectly predicted as Negative by the model but are actually Positive. FN represents the missed positive samples.

Example: In the medical diagnosis scenario, FN represents the number of patients with a disease who were incorrectly identified as not having the disease by the model.

False Positive (FP): False Positive refers to the number of instances that are incorrectly predicted as Positive by the model but are actually Negative. FP represents the falsely identified positive samples.

Example: In a spam email detection scenario, FP represents the number of legitimate emails that were incorrectly classified as spam by the model.

True Negative (TN): True Negative refers to the number of instances that are correctly predicted as Negative by the model. TN is the number of negative samples that the model correctly identified.

Example: In the spam email detection scenario, TN represents the number of legitimate emails correctly identified as not spam by the model.

Based on these four values, several evaluation metrics can be calculated, including:

● Accuracy: (TP + TN) / (TP + TN + FP + FN)
● Precision: TP / (TP + FP)
● Recall (Sensitivity or True Positive Rate): TP / (TP + FN)
● Specificity (True Negative Rate): TN / (TN + FP)
● F1 Score: 2 * (Precision * Recall) / (Precision + Recall)

The confusion matrix provides a comprehensive view of the model's performance, helping to identify potential areas for improvement and guiding further model adjustments or hyperparameter tuning.

Interview Questions :

1. What is confusion matrix?

2. What are the terms in confusion matrix?

3. How to calculate evaluation metrics?