A Scatterplot uses dots to represent values for two different numeric variables. The position of each dot indicates values for an individual data point along the axes. Dots can also be color-coded and sized based on other variables.
Technical details
Description
Scatterplots are most useful when they have continuous numeric variables in the x-axis and y-axis. While you are able to make the x-axis categorical, the y-axis will always be on a numeric scale.
Inputs
DATA SOURCE
1. Scatterplots accept tables supplied using either Paste or type data or existing output in 'Pages'. These are expected to be tables where each row of the input data is shown as a separate point.
Columns 1 and 2: control the x and y coordinates, respectively.
Column 3: if provided, controls the sizes
Column 4: if provided, controls the colors of the points
Column .... : Additional columns in the table can be referred to for use with annotations.
Rownames: When the input table contains rownames, these will be used as the data labels.
-
- Users can add entries to Columns to ignore to skip specific columns.
- If multiple tables are selected, each one is expected to be in the same format as described above, but row names and column names must be the same across all tables. Note that the default format of the input data for Scatter plots is different from other visualizations and Row/Column manipulations may not behave as expected. In these cases, you may want to select Input data contains y-values in multiple columns.
2. Alternatively, the user can assign X coordinates, Y coordinates, Sizes and Colors to be variables or outputs. This option is more flexible because each of these 4 components can be separately assigned instead of being extracted from the same table. However, it is also more complicated because the behavior may change slightly depending on the inputs chosen.
-
-
Inputs are variables. This is the simplest use case; a marker is shown for each entry in the variables (i.e. the variables are expected to be the same length).
Inputs are tables. In this case, if the tables are simple 1-column tables, then they will behave exactly the same as the variable. However, where they have additional attributes, the chart will attempt to use these as well. If the tables have row labels, these will be used as the labels to the data points. It is also possible to explicitly use the row labels as X coordinates by selecting Use category labels instead of values. In the case where this is selected and a banner is used, the span labels are used instead of the row labels. If the Y coordinates is a 2-dimensional table, then the columns will be treated as separate data series (i.e. in different colors). If the Y coordinates table contains multiple statistics, then these may be used in the annotations.
X or Y coordinates are a Standard R Regression model output. In this case, either the regression coefficients or the importance scores are used as the data input. This is useful in particular for creating Quad Maps from a Driver Analysis output.
-
-
- Input data contains y-values in multiple columns. When this is selected, each cell in the input table is shown as a separate point. The values in the table are used as the y-coordinates, whereas the x-coordinates is taken from the row labels. Each column is shown as a separate group, with the colors of the groups controlled by the color palette (under Data series). All points will be shown with the same size. If the table contains multiple statistics, these can be used to add annotations to the chart.
Chart
-
APPEARANCE
- Show labels on chart or as hovertext. The second option (default) handles large datasets better and offers more charting options. However, for small datasets where the first option performs better at showing a moderate number of labels on the chart. The labeled scatterplot will automatically position the labels to avoid overlap, but users can also drag on labels to move them and click on markers to hide/show labels. These changes are remembered on recalculation, but will be reset when the input data changes.
-
DATA LABELS
- Automatically position data labels When labels are shown on chart, the data labels are sometimes automatically positioned to the side of the actual point for better visibility. Unchecking this will place the data label on top of the data point.
-
- Maximum number of labels to plot. This option limits the number of labels shown when the labels are shown on chart. It is useful when there are many points with overlapping labels. The remaining points will be shown without labels.
-
- Treat sizes variable as area or diameter. If the input data contains a sizes variable, the points will be shown with either the area or the diameter of the points proportional to the absolute value of the size variable (e.g. Example 3). The sizes variable can either be specified as a variable in the Sizes dropdown option, or as the third numeric column of a linked output or pasted table. The variable used for the Sizes will be coerced into a numeric vector (which may not make sense for character or categorical variables).
-
- Treat colors variable as categories or numeric scale. If the input data contains a colors variables, the color of each point will be determined by the value of the corresponding entry in the colors variable. The colors can be shown as categories (e.g. Example 2), or vary continuously over a numeric scale (e.g. Example 3). The colors variable can either be specified as a variable in the Colors dropdown option, or as the fourth (numeric) column of a linked output or pasted table.
-
- Show bubble legend Show legend associating the size of the points with the values in the sizes variable. Note that the bubble legend is only shown when a sizes variable is used and the labels are shown on chart (not as hovertext).
-
- Show trend lines This option is only available when Show labels is set to On chart. If multiple tables are given, then trend lines are shown between corresponding points in the different tables. The order of the tables determines the order of lines between points. Otherwise, trend lines are added between consecutive points in the same group (Treat colors variable must be set to Categories). If trend lines are used then Sizes are ignored.
-
- Logos A comma-separated list of URLs to be used instead of labels. Not available when the input data is supplied as raw variables.
-
- Logo size The size of the logos, specified in increments of 0.1.
-
- Marker size If the input data does not contain a sizes variable, the marker size specifies the marker diameter in pixels. If a sizes variable is provided, marker size is used to scale the marker size. Specifically, the largest marker will have a diameter of 50/6 * marker size. (For the default value of 6, the largest circle will be approximately half an inch in diameter).
-
ANNOTATION (only when labels are shown as hover)
- Annotation The type of annotation to add, this can be one of Arrow - up, Arrow - down, Marker border, Text - after data label, Text - before data label and Border, Hide, Shadow. The last three are only shown if text is added first.
-
- Data If the input data is variables from a dataset, then the data used for the annotation can be a variable selected using a combo box. If the input data is a table, then type in the column name of the data to use.
-
- Threshold Enter a threshold value to select values to annotate. If data is numeric, setting the threshold to -Inf or Inf will select all (non-missing) values. For date variables, the threshold will try to be parsed as a date string. For an ordered categorical variable, type in the name of one of the levels.
Output
When specifying using variables all cases will be plotted:
When specifying using a table the cell values will be plotted:
Using the same table, you can select DATA MANIPULATION > Input data contains y-values in multiple columns to make the row headers the x-axis:
Using a table as the input, multiple columns can be used to create a colored bubble chart:
More examples of Scatterplots with different types of input can be found here.
More Information
Using scatterplots to chart trends
Adding logos to scatterplots in Displayr
Method
- In Displayr: How to create a Scatter Plot
- In Q: How to Create a Scatterplot in Q