KMeans2D - chart function

KMeans2D() evaluates the rows of the chart by applying k-means clustering, and for each chart row displays the cluster id of the cluster this data point has been assigned to. The columns that are used by the clustering algorithm are determined by the parameters coordinate_1, and coordinate_2, respectively. These are both aggregations. The number of clusters that are created is determined by the num_clusters parameter. Data can be optionally normalized by the norm parameter.

KMeans2D returns one value per data point. The returned value is a dual and is the integer value corresponding to the cluster each data point has been assigned to.

Syntax:

KMeans2D(num_clusters, coordinate_1, coordinate_2 [, norm])

Return data type: dual

Arguments:

Arguments
Argument	Description
num_clusters	Integer that specifies the number of clusters.
coordinate_1	The aggregation that calculates the first coordinate, usually the x-axis of the scatter chart that can be made from the chart. The additional parameter, coordinate_2, calculates the second coordinate.
norm	The optional normalization method applied to datasets before KMeans clustering. Possible values: 0 or ‘none’ for no normalization 1 or ‘zscore’ for z-score normalization 2 or ‘minmax’ for min-max normalization If no parameter is supplied or if the supplied parameter is incorrect, no normalization is applied. Z-score normalizes data based on feature mean and standard deviation. Z-score does not ensure each feature has the same scale but it is a better approach than min-max when dealing with outliers. Min-max normalization ensures that the features have the same scale by taking the minimum and maximum values of each and recalculating each datapoint.

Auto-clustering

KMeans functions support auto-clustering using a method called depth difference (DeD). When a user sets 0 for the number of clusters, an optimal number of clusters for that dataset is determined. Note that while an integer for the number of clusters (k) is not explicitly returned, it is calculated within the KMeans algorithm. For example, if 0 is specified in the function for the value of KmeansPetalClusters or set through a variable input box, cluster assignments are automatically calculated for the dataset based on an optimal number of clusters.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here

KMeans2D - chart function

Auto-clustering

Did this page help you?

Join the Analytics Modernization Program