Scatter plot
The scatter plot presents pairs of values from two or three measures.
This is useful when you want to show data where each instance has two numbers, for example, the relationship between Sales and Quantity per Customer. In the scatter plot below, a third measure (Cost) is used to generate the bubble size.
The scatter plot presents values from different measures over one dimension as a collection of points. In most charts you find your dimension on one of the axes, but for a scatter plot, the dimension is represented by the points in the chart and the measures are found on each of the two axes. When a third, optional, measure is used, its value is reflected in the bubble size. If you are analyzing large data sets and view compressed data, the density of the data points is reflected by color.
When to use it
The scatter plot helps you find potential relationships between values, and to find outliers in data sets. The scatter plot is useful when you want to show data where each instance has at least two metrics, for example, average life expectancy and average gross domestic product per capita in different countries.
Advantages
The scatter plot is a great way to visualize the correlation of two or more measures at the same time. The third measure is an efficient way of differentiating between values and simplifying the identification of, for example, large countries, large customers, large quantities, and so on.
Disadvantages
The scatter plot may be difficult to understand for an inexperienced user, because it has measure value on both axes, and the third, optional, measure adds complexity to the interpretation. Make sure a novice can interpret the scatter plot correctly. Using descriptive labels is a good way to make the visualization easier to interpret.
Values may be placed on top of each other and are then not visible until you zoom in.
Display limitations
Large data sets in scatter plots
You can set the maximum number of visible bubbles in a scatter plot if you are using advanced edit mode. Go to Presentation > Max visible bubbles. The default is 2,500. The maximum is 50,000. If you set a number less than 1,000, the scatter plot will behave as if the maximum is 1,000 visible bubbles.
If the number of displayed data points is less than Max visible bubbles, the data will be shown as individual bubbles. If there are more data points than the number set in Max visible bubbles, you will see an overview of your dataset as a table with colored boxes. This switch between compressed view and bubble view is done automatically. If there are more than 5,000 visible bubbles, then bubble labels and out of bound bubbles will not be shown.
Zooming and panning
In a scatter plot, you can zoom and pan around in your data. The interaction differs depending on what device you are using. If you are zooming in you can see where in the data set you are located by looking at the mini chart in the bottom right corner. If you zoom in on large data sets you will be able to see the data shown as boxes with values inside. The values represent the number of points in each box.
You cannot make selections when the scatter plot is rendering during a pan or zoom.
Zooming and panning is not possible when you have made a selection in the compressed data view.
You can change the compression resolution in the visual exploration menu or in the properties panel.
Colors and legends
Compressed scatter plots are always auto-colored by the primary data color. That means that any custom color definition made will not affect compressed data. The density of the data points is reflected by color. More dense data will be displayed in a darker color.
However, if you zoom or make selections so that the number of displayed data points is reduced to less than Max visible bubbles, the data will be colored as defined.
When coloring by dimension is used in this chart, a maximum of 100 unique dimension values and 100 unique colors can be plotted on the legend.