LityxIQ can be set to automatically handle missing values in the modeling dataset based on the settings on the Missing Value Handling tab. Missing values are often expected in real datasets, so it is important to use appropriate techniques to deal with them during the modeling process.
- Remove Variables with Missing Values - Check this box if you wish to remove variables that have too high a percentage of missing values. See the next option for how to define the cutoff percentage.
- Remove if Pct Missing Greater Than - Use this to set the maximum percentage of missing values allowed for a variable before it is removed from the modeling dataset. This option is only available if the checkbox above is checked.
- Handling for Numeric Variables – Here you can set the method for how missing values for numeric variables are handled.
- Replace With Variable Mean - Use this option to have all missing values for a variable replaced with the average of the non-missing values for that variable.
- Replace with Zero - Use this option to have all missing values for a variable replaced with a zero. This is often not appropriate, but for some datasets it will make sense.
- Categorize the Variable - This option will perform intelligent numeric binning on any numeric variable that has missing values. In addition, a bin will be created that contains all the records with missing values. This option is a great choice, especially in situations where missing values arise for a particular reason, or there are many missing values in a dataset (since those records will not be removed from the modeling process). Note that if you select this option, it will over-ride the algorithm setting related to binning numeric variables. If you have turned off numeric binning, but have selected to categorize numeric variables with missing values, any numeric variable that does have missing values will be categorized (binned).
- Remove Row - Use this option to completely remove any rows from the modeling dataset that have a missing value.
- Handling for Categorical Variables – Here you can set the method for how missing values in categorical variables are handled.
- Replace with Blank Value - Use this option to have all missing values for a variable replaced with a BLANK value. This effectively treats the missing values as a new group in their own right.
- Replace with Mode - Use this option to have all missing values for a variable replaced with the mode for that variable. The mode is the value that occurs most frequently.
- Remove Row - See above.