Reasons for NULL or Missing Scores in Scoring Catalog

In some situations, you may notice that the scores resulting from a scoring job may be missing (or Blank or NULL).  There are two common reasons that this can occur.  These are listed below, with some recommendations for resolving the issue.  Note that resolving missing scores is not always possible, depending on the data itself and in some cases, the algorithm being used.

Situation 1 - New Value of a Categorical Variable

Consider a categorical variable that was used to build the model, and in the dataset used to build the model, it had values "A", "B", and "C".  If the dataset used to score records against this model now has a previously unknown value "D" for this variable, it can lead to a missing score.  The reason is that the model and therefore the scoring job has no information regarding patterns associated with the value "D".  Missing values in this case are an underlying sign that your dataset structure and values have changed since building the model.

Potential Solutions to Consider

1. If it has been a while since the model was built, you might consider refreshing the model with the new dataset.  This will ensure the "D" (in the above example) are accounted for.

2. When preparing data for a model, you can create a binned version of the variable that groups together into a single bin all values that are not among the most occurring values of the variable.  This "small percentage" bin will be a part of the model estimation.  When it comes time to score a new dataset, and a new value such as "D" now appears, records with "D" will be included in the "small percentage" bin, and will get valid scores.  The caveat to this approach is that it assumes that newly appearing values behave similarly as previously seen low-percentage values.

 

Situation 2 - Missing Values in a Predictor Variable for the First Time

Another situation where missing values can occur in scores is when a predictor variable did not have missing values in the modeling dataset, but now has missing values in the dataset being scored.  Similar to the situation above, these missing values are unknown to the model, and therefore the associated scoring job may not have information on how to create scores.

Potential Solutions to Consider

1. If the field is numeric, the model build setting Missing Value Handling for Numeric Values can be set to "Replace with Variable Mean".  This will ensure that any missing values that do occur, whether in the modeling stage or the scoring stage, will always get replaced with a valid value (the mean for the variable).  The caveat is that this replacement value is just an imputation for the correct value and itself is not necessarily correct value.  However, if this is done consistently and if there are not too many missing values, then this is a good approach.

2. If the field is categorical, similar to above, the model build setting Missing Value Handling for Categorical Variables can be set to "Replace with Mode".  This has a similar impact as replacing with the variable mean for numeric fields.