Model Performance Analysis for Unsupervised Clustering Models

Unsupervised clustering and segmentation models have some performance analysis results that are unique to these type of models.  For general information about Performance Analysis for LityxIQ machine learning models, see  Some analysis types that are mentioned on that page are not available for clustering models.

When analyzing performance for clustering models, the unique analysis types that become available are:


Cluster Statistics

This will provide a high level summary of the clusters that were created.  For each cluster, the percentage of the dataset in the cluster will be shown, as well as the Within SSE and MSE.  These are two measures of the variation inherent in the cluster.  Relatively high values signify a cluster that contains a very diverse group of data records.  Relatively low values signify a cluster whose members are very similar to each other across many of the variables.


Cluster Profiles - Tabular

This output provides a high level profiling of the members of each cluster, profiled against the variables on which the clustering was based.  For numeric variables, it will show the average value in the cluster.  For categorical variables, it will show the most common value in the cluster, or, if more than one value were highly common, it may show two or three common values.


Cluster Profiles - Numeric

This output provides a chart that shows, for each numeric variable, a graph comparing the average value across clusters.  The variable displayed can be changed using the dropdown in the upper left corner of the chart.


Cluster Profiles - [categorical variable]

Each categorical variable in the clustering model will have its own cluster profile view, with the variable name shown in the menu selection.  The chart will show, for the most common values of the variable, the percentage of each cluster's members that have each value.

Note that the Performance Summary analysis type has very different metrics, including within mean square and sum of squares, as well as the value of the criterion used to help identify clusters.  See for more information on these.