KMeans2D() evaluates the rows of the chart by applying k-means clustering, and for each chart row displays the cluster id of the cluster this data point has been assigned to. The columns that are used by the clustering algorithm are determined by the parameters coordinate_1, and coordinate_2, respectively. These are both aggregations. The number of clusters that are created is determined by the num_clusters parameter. Data can be optionally normalized by the norm parameter.
KMeans2D returns one value per data point. The returned value is a dual and is the integer value corresponding to the cluster each data point has been assigned to.
KMeans2D(num_clusters, coordinate_1, coordinate_2 [, norm])
Return data type: dual
|num_clusters||Integer that specifies the number of clusters.|
|coordinate_1||The aggregation that calculates the first coordinate, usually the x-axis of the scatter chart that can be made from the chart. The additional parameter, coordinate_2, calculates the second coordinate.|
The optional normalization method applied to datasets before KMeans clustering.
0 or ‘none’ for no normalization
1 or ‘zscore’ for z-score normalization
2 or ‘minmax’ for min-max normalization
If no parameter is supplied or if the supplied parameter is incorrect, no normalization is applied.
Z-score normalizes data based on feature mean and standard deviation. Z-score does not ensure each feature has the same scale but it is a better approach than min-max when dealing with outliers.
Min-max normalization ensures that the features have the same scale by taking the minimum and maximum values of each and recalculating each datapoint.
KMeans functions support auto-clustering using a method called depth difference (DeD). When a user sets 0 for the number of clusters, an optimal number of clusters for that dataset is determined. Note that while an integer for the number of clusters (k) is not explicitly returned, it is calculated within the KMeans algorithm. For example, if 0 is specified in the function for the value of KmeansPetalClusters or set through a variable input box, cluster assignments are automatically calculated for the dataset based on an optimal number of clusters.