Each algorithm in LityxIQ has an Advanced Settings tab available. This provides the ability for more advanced users to have detailed control over various parts of the pre-processing that LityxIQ performs when evaluating the data and variables in the model. These settings are initially populated with the most common and useful values that work across many situations, so it is not necessary to change these parameters to build good models.
Below, the options are explained. Note that not all options shown below are available for all algorithms.
Numeric Autocorrelation Threshold - If autocorrelation checking is turned on, then numeric predictors with correlation greater than this number will be removed.
Categorical Autocorrelation Threshold - If autocorrelation checking is turned on, then categorical predictors with 1-(chi-square significance) greater than this number will be removed.
Higher Order Term Threshold - If higher order terms (interactions, polynomials) are being identified in regression models, they must meet this p-value criterion (lower than this value) to be considered.
Maximum Higher Order Terms - If higher order terms (interactions, polynomials) are being identified in regression models, this is the most such terms that will be considered.
Normalization Skewness Threshold - If numeric predictors are being normalized and if the skewness coefficient is larger than this value, it is logged instead of standardized.
Categorical P-value Binning Threshold - If categorical predictors are being binned, different groups are not combined if their dependence p-value is lower than this value.
Categorical Binning Minimum Bin Size - If categorical predictors are being binned, this is the smallest number of observations allowed in a bin.
Categorical Binning Minimum Number Bins - If categorical predictors are being binned, this is the smallest number of bins allowed for each predictor.
Maximum Distinct Categorical Values - This represents the maximum number of unique values any categorical predictor is allowed to have. Many algorithms have other restrictions that limit the number of allowed unique values to a smaller number than entered here.
Categorical Predictor Domination Threshold - If a single value of a categorical predictor represents more than this percentage of all values for that predictor, it is not considered in the model.
Minimum Unique Numeric Values - If a numeric field has this number of unique values or fewer, it will be categorized and treated as categorical.