# Machine Learning Algorithms

LityxIQ supports a number of machine learning algorithms.  Different algorithms tend to perform well for different situations, often depending on the dataset itself.  Below is a list of supported algorithms, as well as the types of models supported for each, and a link to documentation for the algorithm that includes an overview of the settings available.

 Algorithm Continuous-value target Binary target Time-series Documentation Linear Regression x https://support.lityxiq.com/050447-Algorithm-Overview---Linear-and-Logistic-Regression Logistic Regression x https://support.lityxiq.com/050447-Algorithm-Overview---Linear-and-Logistic-Regression CART x x CHAID x x Neural Net x x Deep Net (Coming Soon!) x x Random Forest x x XGBoost (Coming Soon!) x x Gamma Regression x Probit Regression x Naïve Bayes x x SVM x x ARIMA x Holt-Winters x Loess Decomposition x VAR x

## Brief Descriptions

CART - C&RT, a recursive partitioning method, builds classification and regression trees for predicting continuous dependent variables (regression) and categorical predictor variables (classification).

CHAID - a type of decision tree technique used for prediction (in a similar fashion to regression analysis) as well as classification, and for detection of interaction between variables. CHAID stands for CHi-squared Automatic Interaction Detection.  In practice, CHAID is often used in the context of direct marketing to select groups of consumers and predict how their responses to some variables affect other variables.  Like other decision trees, CHAID's advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively, since with small sample sizes the respondent groups can quickly become too small for reliable analysis.  One important advantage of CHAID over alternatives such as multiple regression is that it is non-parametric (not involving any assumptions as to the form or parameters of a frequency distribution).

CHAID vs. CART

• CHAID uses a p-value from a significance test to measure the desirability of a split, while CART uses the reduction of an impurity measure.

• CHAID searches for multi-way splits, while CART performs only binary splits.

• CHAID uses a forward stopping rule to grow a tree, while CART deliberately overfits and uses validation data to prune back.

Gamma Regression - It is very similar to linear regression in terms of its use cases.  It can be used in pretty much the same cases.  Whereas linear regression assumes normality of the target variable and error term, gamma regression is based on the gamma distribution.  This leads to the main difference which is that gamma regression can be useful if the dependent variable is skewed.  Technically, the dependent variable also has to be always positive, but that can always be gotten around by just adding a constant to it if any values are 0 or negative.

Neural Net - Neural networks are typically organized in layers. Layers are made up of a number of interconnected 'nodes' which contain an 'activation function'. Patterns are presented to the network via the 'input layer', which communicates to one or more 'hidden layers' where the actual processing is done via a system of weighted 'connections'. The hidden layers then link to an 'output layer' where the answer is output as shown in the graphic below. Linear Regression - an approach for modeling the relationship between a continuous dependent variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression.  In linear regression, data are modeled using linear predictor functions, and unknown model parameters are estimated from the data. Such models are called linear models.

Logistic Regression - a type of probabilistic statistical classification model used for predicting the outcome of a categorical dependent variable based on one or more predictor variables. The probabilities describing the possible outcomes of a single trial are modeled, as a function of the explanatory (predictor) variables, using a logistic function.

Naïve Bayes - In machine learning, naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.  Naive Bayes is a popular (baseline) method for text categorization, the problem of judging documents as belonging to one category or the other (such as spam or legitimate, sports or politics, etc.) with word frequencies as the features. With appropriate preprocessing, it is competitive in this domain with more advanced methods including support vector machines.

Probit Regression - a type of regression where the dependent variable can only take two values, for example married or not married. It treats the same set of problems as does logistic regression using similar techniques. The probit model, which employs a probit link function, is most often estimated using the standard maximum likelihood procedure, such an estimation being called a probit regression.

Random Forrest - an ensemble learning method for classification (and regression) that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes output by individual trees.

SVM (Support Vector Machines) - In machine learning SVMs are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.

Holt-Winters - a type of double exponential smoothing where exponential smoothing assigns exponentially decreasing weights over time unlike a simple moving average where past observations are weighted equally.  http://en.wikipedia.org/wiki/Exponential_smoothing

Loess Decomposition - A filtering procedure to decompose a time series into seasonal, trend and irregular components using loess, acronym STL.  http://cs.wellesley.edu/~cs315/Papers/stl%20statistical%20model.pdf

ARIMA - Autoregressive integrated moving average.  A form of regression analysis that seeks to predict future movements along the seemingly random walk through examination of the differences between values in the series instead of using the actual data values. Lags of the differenced series are referred to as "autoregressive" and lags within forecasted data are referred to as "moving average."  http://www.forecastingsolutions.com/arima.html

VAR - Vector Autoregression.  A generalization of a univariate autoregression (AR) model. An AR model explains one variable linearly with its own previous values, while a VAR explains a vector of variables with the vector's previous values. The VAR model is a statistical tool in the sense that it just fits the coefficients that best describe the data at hand. You still should have some economic intuition on why you put the variables in your vector. For instance, you could easily estimate a VAR with a time-series of the number of car sales in Germany and the temperature in Australia. However, it's hard to sell to someone why you are doing this, even if you would find that one variable helps.  http://christophj.github.io/replicating/r/vector-autoregression-var-in-r/

#### Other Articles 