Concepts

Reasons for NULL or Missing Scores in Scoring Catalog

In some situations, you may notice that the scores resulting from a scoring job may be missing (or Blank or NULL). There are two common reasons that this can occur. These are listed below, with some recommendations for resolving the issue. Note that resolving missing scores is not always possible, depending on the data itself and in some cases, the algorithm being used. Situation 1 - New Value of a Categorical Variable Consider a categorical variable that was used to build the model, and in...

Algorithm Overview - Prophet

Algorithm Description Prophet is a time series forecasting algorithm based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality. Prophet is robust to missing data, handling outliers and shifts in trend. The algorithm is also designed to have intuitive parameters that can be adjusted without knowing the details of the underlying model. Note that in Lityx IQ, the date/ordering variable specified in the model settings should be a Date variable because...

Algorithm Overview - K-Means

Algorithm Description The K-Means algorithm is used to perform "unsupervised" clustering of a dataset. The term "unsupervised" means that there is no target variable to guide the analysis. Only predictor (or independent) variables are used. The variables are analyzed in a way that looks to group dataset records together the more similar they are (relative to the variables involved), and put records in different groups if they are not too similar. Normally, the similarity of records is determi...

Algorithm Overview - SVM

Algorithm Description SVM is an acronym for Support Vector Machine. The objective of the SVM algorithm is to draw a line (jn fact, likely a very complex multi-dimensional curve) that does a good job of separating the data in a manner to minimize error. For example, in a classification problem (see picture below), finding a line that best separates green from blue dots is pretty straightforward. That line, when determined in this case, can be used to determine whether a record is "blue" or "gr...

Algorithm Overview - Naive Bayes

Algorithm Description Naïve Bayes is a probabilistic classification algorithm, based on Bayes Theorem. Bayes Theorem suggests that we can find the probability of an event, given the probability another event has occurred. With this algorithm, we must assume that each feature makes an independent and equal contribution to the analysis. This translates to the assumptions that no pair of features are dependent, and each feature equally contributes to the classification. Using Bayes Theorem, th...

Algorithm Overview - Neural Networks

Algorithm Description LityxIQ’s “Neural Net” and “DeepNet” algorithms are both “deep networks” because they have at least one hidden layer. The Neural Net algorithm is recommended for beginner users, and the DeepNet algorithm is recommended for users wanting to experiment with more parameters. In this document, a description will be provided about neural networks, followed by the differences in parameters that each algorithm employs. The terms "Neural Net", “Deep Learning”, “Deep Neural Net”,...

Algorithm Overview - CART/CHAID

Algorithm Description CART and CHAID are both Decision Tree machine learning algorithms. Their objective is to find quantitative splits (segments) of the dataset that do a good job of differentiating the dataset with respect to the target variable. These segments are created by iteratively splitting the dataset based on key values of the most most important predictor variables. Most decision tree algorithms differ with respect to how they determine the most important predictors, and the key v...

Algorithm Overview - XGBoost

Algorithm Description Extreme Gradient Boosting (XGBoost) is a decision tree based Machine Learning algorithm, used for classification and regression problems. The Gradient Boosting algorithm builds decision trees sequentially (instead of in parallel and independently, like Random Forest) such that each subsequent tree aims to reduce the errors of the previous tree. Each tree learns from its predecessor and updates the residual errors. Hence, the tree that grows next in the sequence is learni...

Algorithm Overview - Linear and Logistic Regression

Algorithm Description Linear and Logistic Regression are supervised machine learning techniques which investigate the relationship between a dependent variable (target) and independent variable(s) (predictors). A Linear Regression model focuses on predicting a continuous target, while a Logistic Regression model aims to predict a binary target (e.g. 1/0 , True/False, Yes/No). Both techniques can have continuous or discrete predictors. Linear Regression Linear Regression establishes ...

Algorithm Overview - Random Forest

Algorithm Description The Random Forest algorithm is a supervised machine learning technique that uses many individual decision trees to form a "forest” or ensemble. Random Forests are trained using a method called ‘bagging’, which uses randomly sampled subsets of the data to train each decision tree. This method helps reduce variance in the model. The ‘bagging’ method is also applied to the feature space, where only a random subset of features is considered at each split in each decision tre...

What Can You Do in Predict?

Predict contains four main areas which are available in the left panel of LityxIQ after clicking Predict. These are described below in more detail. Models - Use this link to build, manage, and execute predictive models. Here you can create, define, analyze, schedule, compare, approve, and implement many types of models. You can also manage model versioning here. Scoring Jobs - Use this link to create, schedule, and execute scoring jobs. Scoring Catalogs - Use this link to create and m...

Building Models for Novice Users

For building predictive models, Predict is a unique platform for novice or business users. Powerful and accurate models can be built, maintained, and implemented with little effort. Technical aspects of model building can be left to the platform. As the novice user practices and gains more knowledge about the modeling process, they then have the option of working with some of the more advanced options and parameters. However, even the novice user needs to have a firm grasp of some key concept...

Types of Models

In Predict, predictive models are created with a business objective in mind. The technical aspects of the model, such as statistical algorithms and options, are optional and will be discussed in a separate article. When creating a new model (see https://support.lityxiq.com/338307-Creating-a-New-Predictive-Model for more information), there are several types to choose from. The options available may differ from those shown below. - Affinity - used to predict an individual's affinity (...

Machine Learning Algorithms

LityxIQ supports a number of machine learning algorithms. Different algorithms tend to perform well for different situations, often depending on the dataset itself. Below is a list of supported algorithms, as well as the types of models supported for each, and a link to documentation for the algorithm that includes an overview of the settings available. Algorithm Continuous-value target Binary target Unsupervised Clustering Time-series Documentation Linear ...

Algorithm Overview - Forecasting Algorithms

Forecasting is used to predict time series data using trends and seasonality factors. Examples include sales data, web traffic data, or social media activity data. The target variable is numeric and stored in time sequence. More information can be found at http://en.wikipedia.org/wiki/Forecasting, and another good resource is the site https://otexts.com/fpp2/. LityxIQ Forecasting Algorithms Holt-Winters - a type of double exponential smoothing where exponential smoothing assigns exponential...

C-Statistic and ROC Curves

The C-Statistic, also called Concordance Statistic or C-Stat, is a common metric used analyze performance of binary classification models, and to compare multiple models to one another. Specifically, the C-statistic is computed as the area under the ROC curve. The minimum value of c is 0.0 and the maximum is 1.0. C-values of 0.7 to 0.8 to show acceptable discrimination, values of 0.8 to 0.9 to indicate excellent discrimination, and values of ≥0.9 to show outstanding discrimination. An RO...