Performance Analysis: Metrics to Analyze: Classification Models

This provides a description of machine learning algorithm performance metrics that are provided for binary classification-style models. In LityxIQ, model types such as Binary Classification, Affinity, Churn, Response, Risk, and others fall into this category.

Lift - An overall measure of the model’s sorting efficiency of targets and can range in value from 0 to 100 with 0 being no better than random.  If the model were perfect all of the targets would get assigned higher scores than all of the non-targets and the lift would be 100.   

Lift 1 vs. 2 – The performance of decile 1 as compared to decile 2 expressed in the form of an index.  For example a value of 1.2586 indicates decile 1 performance is nearly 26% better than decile 2.

Lift 1 vs. 10 - The performance of decile 1 as compared to decile 10 expressed in the form of an index.  For example a value of 7.1257 indicates decile 1 performance 7x better than decile 10.  NOTE if this appears blank that would generally mean that it was infinite, i.e. the denominator was zero, or in other words, no responses in the 10th decile.  This would happen when all of the predicted scores get assigned to prior deciles because the scores are the same.

Lift 1 and 2 vs. Rest – The performance of deciles 1 and 2 combined as compared to deciles 3-10 combined.  For example a value of 2.2249 indicates deciles 1 and 2 combined performance 2.2x better than the bottom 8 deciles.

Lift 1 Over Random – The performance of decile 1 as compared to the population total.  For example a value of 1.9918 indicates decile 1 performance is 99% better or nearly 2x the population total and if you were to select randomly.

C-Statistic - The C-stat is the area under the ROC curve. The minimum value of c is 0.0 and the maximum is 1.0. C-values of 0.7 to 0.8 to show acceptable discrimination, values of 0.8 to 0.9 to indicate excellent discrimination, and values of ≥0.9 to show outstanding discrimination.  An  ROC curve is a graphical plot showing the true positives out of the total actual positives (TPR = true positive rate) vs. the false positives out of the total actual negatives (FPR = false positive rate). TPR is also known as sensitivity while FPR can be calculated as one minus the more well-known specificity.  In addition, see https://support.lityxiq.com/364317-C-Statistic-and-ROC-Curves for more information on analyzing the model's ROC curve and decision thresholds.

Ranked Value Correlation* = uses the ranking of the model scores from best to worst and calculates the correlation coefficient to the actual occurrences.  The value will always be between 0 and 1 with higher values better.  * This was not selected for the graph because the scale of the value is much lower than everything else and so the bar for this would not be visible.

The key to understanding these is understanding the numerator and denominators for each metric.  The idea behind these come from the matrix below.  A prediction is categorized as a 1 if the prediction is equal to or greater than the population rate.  For example if the population response rate is .015 (1.5%) and the model prediction/score is .016 then that record has a predicted result of 1 whereas a record with a prediction/score of say .01 would have a predicted result of 0.

 

 

Actual Result

 

 

1

0

Predicted Result

1

True Positives (a)

False Positives (b)

0

False Negatives (c)

True Negatives (d)

False Positive Rate = b / (b+d) what percent of the actual negatives were incorrectly predicted

False Negative Rate = c / (a+c) what percent of the actual positives were incorrectly predicted

Positive Predictive Value = a / (a+b) what percent of the predicted positives were correct

Negative Predictive Value = d / (c+d) what percent of the predicted negatives were correct

Sensitivity (True Positive Rate) = a / (a+c) what percent of the actual positives were correctly predicted

Specificity (True Negative Rate) = d / (b+d) what percent of the actual negatives were correctly predicted

Percent Correct* = (a+d) / (a+b+c+d) = what percent of the models predictions were correct.  * This was not selected for the graph because the scale of the value is much higher than anything else and so the other bars appear too small to read.

Overfit Potential - Overfitting of a model is a concept that relates to how predictive a model will be on new data it has never seen.  An overfit model may look strong on the dataset on which it was built, but doesn't translate well to new data.  The Overfit Potential metric in LityxIQ ranges from 0 to 100.  Lower values represent a model that is not overfit, while larger values signify more likelihood that the model is overfit.  Note that even for models in which there is a higher overfit potential, the performance metrics reported in LityxIQ (such as the ones mentioned above) will be a good representation of how it will perform on new data as long as you used validation techniques such as Holdout or Cross-Validation.  This is because LityxIQ computes performance metrics against new data, and so attempts to still report unbiased performance metrics even in the face of overfitting.