KMeansND() evaluates the rows of the chart by applying k-means clustering, and for each chart row displays the cluster id of the cluster this data point has been assigned to. The columns that are used by the clustering algorithm are determined by the parameters coordinate_1, and coordinate_2, etc., up to n columns. These are all aggregations. The number of clusters that are created is determined by the num_clusters parameter.
KMeansND returns one value per data point. The returned value is a dual and is the integer value corresponding to the cluster each data point has been assigned to.
The number of iterations of clustering with reinitialized cluster centers.
coordinate_1
The aggregation that calculates the first coordinate, usually the x-axis (of a scatter chart that can be made from the chart). The additional parameters calculate the second, third, and fourth coordinates, etc.
In this example, we create a scatter plot chart using the Iris dataset, and then use KMeans to color the data by expression.
We also create a variable for the num_clusters argument, and then use a variable input box to change the number of clusters.
Additionally, we create a variable for the num_iter argument, and then use a second variable input box to change the number of iterations.
The Iris data set is publicly available in a variety of formats. We have provided the data as an inline table to load using the data load editor in Qlik Sense. Note that we added an Id column to the data table for this example.
After loading the data in Qlik Sense, we do the following:
Drag a Scatter plot chart onto a new sheet. Name the chart Petal (color by expression).
Create a variable to specify the number of clusters. For the variable Name, enter KmeansPetalClusters. For the variable Definition, enter =2.
Create a variable to specify the number of iterations. For the variable Name, enter KmeansNumberIterations. For the variable Definition, enter =1.
Configure Data for the chart:
Under Dimensions, choose id for the field for Bubble. Enter Cluster Id for the Label.
Under Measures, choose Sum([petal.length]) for the expression for X-axis.
Under Measures, choose Sum([petal.width]) for the expression for Y-axis.
Data settings for Petal (color by expression) chart
The data points are plotted on the chart.
Data points on Petal (color by expression) chart
Configure Appearance for the chart:
Under Colors and legend, choose Custom for Colors.
Choose to color the chart By expression.
Enter the following for Expression: kmeansnd($(KmeansPetalClusters),$(KmeansNumberIterations), Sum([petal.length]), Sum([petal.width]),Sum([sepal.length]), Sum([sepal.width]))
Note that KmeansPetalClusters is the variable that we set to 2. KmeansNumberIterations is the variable that we set to 1.
Alternatively, enter the following: kmeansnd(2, 2, Sum([petal.length]), Sum([petal.width]),Sum([sepal.length]), Sum([sepal.width]))
Deselect the check box for The expression is a color code.
Enter the following for Label: Cluster Id
Appearance settings for Petal (color by expression) chart
The two clusters on the chart are colored by the KMeans expression.
Clusters colored by expression on Petal (color by expression) chart
Add a Variable input box for the number of clusters.
Under Custom objects in the Assets panel, choose Qlik Dashboard bundle. If we did not have access to the dashboard bundle, we could still change the number of clusters using the variable that we created, or directly as an integer in the expression.
Drag a Variable input box onto the sheet.
Under Appearance, click General.
Enter the following for Title: Clusters
Click Variable.
Choose the following variable for Name: KmeansPetalClusters.
Choose Slider for Show as.
Choose Values, and configure the settings as required,
Appearance for Clusters variable input box
Add a Variable input box for the number of iterations.
Drag a Variable input box onto the sheet.
Under Appearance, choose General.
Enter the following for Title: Iterations
Under Appearance, choose Variable.
Choose the following variable under Name: KmeansNumberIterations.
Configure the additional settings as required,
We can now change the number of clusters and iterations using the sliders in the variable input boxes.
Clusters colored by expression on Petal (color by expression) chart
Auto-clustering
KMeans functions support auto-clustering using a method called depth difference (DeD). When a user sets 0 for the number of clusters, an optimal number of clusters for that dataset is determined. Note that while an integer for the number of clusters (k) is not explicitly returned, it is calculated within the KMeans algorithm. For example, if 0 is specified in the function for the value of KmeansPetalClusters or set through a variable input box, cluster assignments are automatically calculated for the dataset based on an optimal number of clusters. Given the Iris dataset, if 0 is selected for the number of clusters, the algorithm will determine (auto-cluster) an optimal number of clusters (3) for this dataset.
KMeans depth difference method determines optimal number of clusters when (k) is set to 0.
Iris data set: Inline load for data load editor in Qlik Sense