The C-Statistic, also called Concordance Statistic or C-Stat, is a common metric used analyze performance of binary classification models, and to compare multiple models to one another.
Specifically, the C-statistic is computed as the area under the ROC curve. The minimum value of c is 0.0 and the maximum is 1.0. C-values of 0.7 to 0.8 to show acceptable discrimination, values of 0.8 to 0.9 to indicate excellent discrimination, and values of ≥0.9 to show outstanding discrimination.
An ROC curve is a graphical plot showing the true positive rate (TPR) against the false positive rate (FPR). TPR is also known as Sensitivity while FPR can be calculated as one minus the Specificity. The C-Statistic will be a larger value, near 1.0, when the ROC curve hugs the upper left side of the plot. This represents the situation of model that has a good mix of low false positive rate and high true positive rate. A model that is not much better than just randomly assigning outputs will have an ROC curve near the 45 degree line, and a C-Statistic near 0.5.
For more information on ROC Curves and the C-Statistic, see:
- https://www.statisticshowto.com/c-statistic/#:~:text=What%20is%20a%20C%2DStatistic,in%20a%20logistic%20regression%20model.
- https://en.wikipedia.org/wiki/Receiver_operating_characteristic
In LityxIQ, you can perform an interactive analysis of decision threshold and error cost analysis based on a binary classification model's ROC curve. For a given model, the continuous scores (between 0 and 1) that it provides for scored records might be used in external applications to make a binary decision. This analysis area in LityxIQ allows you to understand, simulate, and optimize the result of choosing the decision threshold for classifying records.
After selecting Performance Analysis for the model, select "ROC and Error Cost Analysis" from the Analysis Type menu, and refer to the information below for more help.
Cost of a False Negative / Cost of a False Positive - the error cost to assign to false negative and false positive predictions the model makes at the given threshold. The values entered only matter in relation to one another. For example, if the cost of making a false negative prediction in your scenario is five times greater than for a false positive, you could set 5 for the Cost of a False Negative and 1 for the Cost of a False Positive. Setting these values to, for example, 10 and 2, would provide an equivalent result.
Population Pct Positive - the percentage of records in the population expected to actually have a positive result (e.g., have a disease, respond to a campaign, etc). Setting this at 50 is a good starting point for Cost Analysis as it gives equal weight to positive and negative records in the population.
Note that any particular combination of settings for the above mentioned inputs will lead to a specific Error Cost Curve as shown in the "Error Cost Analysis" plots, and a specific "Optimal decision threshold" for the model.
Threshold Simulation - this slider allows you to see the effect of using different threshold values (other than the optimal) on metrics like error rates and risk cost scores. Modifying the value using the slider will not change the Error Cost curve, but will change values shown in the Confusion Matrix and Metrics section (and the vertical line in the Error Cost Analysis will shift to represent your threshold selection).