Skip to main content

KMeans2D - chart function

KMeans2D() evaluates the rows of the chart by applying k-means clustering, and for each chart row displays the cluster id of the cluster this data point has been assigned to. The columns that are used by the clustering algorithm are determined by the parameters coordinate_1, and coordinate_2, respectively. These are both aggregations. The number of clusters that are created is determined by the num_clusters parameter. Data can be optionally normalized by the norm parameter.

KMeans2D returns one value per data point. The returned value is a dual and is the integer value corresponding to the cluster each data point has been assigned to.

Syntax:  

KMeans2D(num_clusters, coordinate_1, coordinate_2 [, norm])

Return data type: dual

Arguments:  

Arguments
Argument Description
num_clusters Integer that specifies the number of clusters.
coordinate_1 The aggregation that calculates the first coordinate, usually the x-axis of the scatter chart that can be made from the chart. The additional parameter, coordinate_2, calculates the second coordinate.
norm

The optional normalization method applied to datasets before KMeans clustering.

Possible values:

0 or ‘none’ for no normalization

1 or ‘zscore’ for z-score normalization

2 or ‘minmax’ for min-max normalization

If no parameter is supplied or if the supplied parameter is incorrect, no normalization is applied.

Z-score normalizes data based on feature mean and standard deviation. Z-score does not ensure each feature has the same scale but it is a better approach than min-max when dealing with outliers.

Min-max normalization ensures that the features have the same scale by taking the minimum and maximum values of each and recalculating each datapoint.