The Selection & Transformation tab appears when editing the settings for a modeling algorithm. The options available on the tab will depend upon the algorithm you are editing. The various options are explained below. Many of the options have related advanced settings which can be found in the Advanced Settings tab.
- Bin Categorical Predictors - If this option is turned on, Predict will search for optimal ways to bin (i.e., combine the categories) for categorical predictors. Turn it on by checking Yes, or turn it off by checking No. You can check both Yes and No, in which case Predict will run it both ways as separate iterations so that you can compare the results.
- Bin Numeric Predictors - If this option is turned on, Predict will search for optimal ways to bin (i.e., group together ranges of values) for numeric predictors. Turn it on by checking Yes, or turn it off by checking No. You can check both Yes and No, in which case Predict will run it both ways as separate iterations so that you can compare the results.
- Normalize Numeric Predictors - If this option is turned on, PREDICT will search for optimal ways to normalize (i.e., scale down the values) for numeric predictors. The normalization may involve a standard normal transformation, logging, or other techniques deemed optimal. Turn it on by checking Yes, or turn it off by checking No. You can check both Yes and No, in which case Predict will run it both ways as separate iterations so that you can compare the results.
- Search for Higher Order Terms - If set to Yes, this option will enable Predict to do extensive automatic searches for higher order terms to place into the model. This includes interaction terms between any sets of variables and polynomial terms for numeric variables. If set to No, this searching will not take place. The Yes setting will require more time and computing power, but will often result in stronger models.
- Autocorrelation Search - If set to Yes, this option will enable Predict to search for pairs of predictor variables that have strong inter-correlations. When it finds such correlations, it will remove one of the pair from further consideration in the modeling process. Setting this option to No will turn off the autocorrelation search and allow all variables chosen as candidate predictors to proceed through further modeling steps. Generally, it is a good idea to allow for autocorrelation search as it provides greater efficiency for the overall process and does not sacrifice much in terms of model performance.
- Variable Search Direction - For certain iterative algorithms, this setting determines how variable selection procedures will be applied. The Forward setting (default) will continue to add variables to the model until certain internal criteria are met. The Forward and Backward setting will do the same, but also allow for variables to later be removed from the model if it is deemed beneficial. The Forward setting generally requires less processing time and leads to strong models, but in some cases Forward and Backward may provide a better result, although it will often lead to longer model runs.