Threshold Analysis and Error-Cost Analysis

In LityxIQ, you can perform an interactive analysis of decision threshold and error cost analysis based on a binary classification model's ROC curve.  For a given model, the continuous scores (between 0 and 1) that it provides for scored records might be used in external applications to make a binary decision.  This analysis area in LityxIQ allows you to understand, simulate, and optimize the result of choosing that binary decision threshold.
 
To get started, select the model and click Evaluate & Explore -> Threshold Analysis from the Selected Model menu or from the right click menu.
 
 
This will open a window allowing for interactively conducting the analysis.  See below for explanations:
 
 
1) Model Version/Iteration - select the model version and iteration you would like to analyze.
 
2) Interactive settings
  • Cost of a False Negative / Cost of a False Positive - the error cost to assign to false negative and false positive predictions the model makes at the given threshold.  The values entered only matter in relation to one another.  For example, if the cost of making a false negative prediction in your scenario is five times greater than for a false positive, you could set 5 for the Cost of a False Negative and 1 for the Cost of a False Positive.  Setting these values to, for example, 10 and 2, would provide an equivalent result.
  • Population Pct Positive - the percentage of records in the population expected to actually have a positive result (e.g., have a disease, respond to a campaign, etc).  Setting this at 50 is a good starting point for Cost Analysis as it gives equal weight to positive and negative records in the population.
  • Note that when you make changes to these values, the Error Cost Analysis curves change, and the Optimal decision threshold is also re-computed and shown as the dark vertical line on that plot.
3) Error Cost Analysis - The curves and information on this chart will change interactively as you modify the interactive settings or the threshold simulation.  They are explained here:
  • Error Cost - Based on the current settings, this is the expected cost incurred for making errors, per 100 records, for any decision threshold between 0 and 1 (inclusive).  The lowest point on this curve represents the decision threshold that provides the lowest error cost.  This curve shows the tradeoff between the cost of false positive and false negative errors, given the entered costs of each and given the model's penchant for making each type of error.
  • False Positive Rate - For each possible decision threshold, this curve shows what the false positive rate would be.
  • False Negative Rate - For each possible decision threshold, this curve shows that the false negative rate would be.
  • Dark and light grey vertical lines - The dark grey line will be placed at the threshold point where the minimum error cost occurs given the current model and settings.  When you modify the threshold using the Threshold Simulation, the light grey line will also appear and be placed at the new simulated value as a way of visually comparing to the optimal value.
  • Subtitle - this will show, given the current interactive settings, the optimal threshold value and the minimum error cost at that value.

4) Detailed Metrics and 5) Confusion Matrix - These sections will change interactively as you modify the interactive settings or the threshold simulation.  They will show a variety of metrics based on the current settings and threshold.

6) Threshold Simulation - While the objective of this analysis is to determine the optimal decision threshold (which LityxIQ automatically computes), you have the option to see the effect of using different decision thresholds on the error cost and other metrics.  Use this slider to change the decision threshold, and see the result interactively in the sections (3), (4), and (5) described above.

7) ROC Curve - this shows the ROC curve of the selected model version and iteration.  It is a helpful reminder as to the strength of the model, and is the basis for all of the calculations being made in this analysis.  It remains static for a selected model/iteration (it is fixed and not affected by the interactive settings).