A Sankey diagram shows the flows between different values of variables or categories in columns in a table. It is generally advisable to view only a small number of variables/columns. Please see the Sankey articles on our blog and Help Center for examples of how to set up data for Sankey diagrams.
Object Inspector Options
The following is an explanation of the options available in the Object Inspector for this specific visualization. Refer to Visualization Options for general chart formatting options.
Data (in Displayr)/Input (in Q) tab
Data Source
There are three options for inserting data into a Sankey diagram:
- Input table A table with each row describing a set of linked categories.
- Variables Categorical variables from a Data set.
- Paste or type data (only available in Q) Enter a table with each row describing a set of linked categories.
Maximum number of categories The maximum number of categories to display for each variable.
Filters & Weight
Weight A dropdown that takes a numeric variable to control the size of each link. This option is only available if the Variables data source is used. Otherwise, use the checkbox last column contains weights.
Chart tab
Appearance
Links colored by
- None: all links are shown in grey.
- Source: links are shown in the same color as the source node (left)
- Target: links are shown in the same color as the target node (right)
- First variable: similar to Source but nodes will also be the same color as nodes they are linked to on the left. If there are multiple such nodes, then the color will be taken from the node which is linked with the largest weight.
- Last variable: similar to First variable, but using the color of the Target node, and looking at downstream links.
Variables share common values If the same colors should be used for each variable in the Sankey diagram.
Node colors / Node and link colors Customize colors of the nodes.
Node width Controls width of the nodes.
Vertical spacing between nodes Controls padding between nodes of the same column.
Order nodes to reduce overlap The vertical positions of the nodes are automatically adjusted to reduce the overlap between links. When this is turned off, nodes are positioned in the order they occur in the data.
Place right-most nodes at the edge Force the nodes to fill up the right edge of the widget. The node labels in the last column will then be placed to the left of the node.
Labels
Font family Font family of node labels.
Font size Font size of node labels.
Include variable in node label Prefix node label with the variable name or label.
Include counts in node label Append node label with the number of observations in each category.
Include percentages in node label Append node label with the percentages of each category.
Variable names Displays Variable Names in the node labels if the Variables data source is used.
Tidy labels Extract common prefixes from the node labels.
Label maximum length Number of characters in the node label before it is truncated. Truncated labels will be indicated with an ellipsis. No truncation is applied to numeric variables.
Hovertext
Show percentages instead of counts Show percentages instead of counts in the hovertext (tooltips) for nodes and links.
Technical details
- An error will occur if more than 20 variables are selected. It is generally advisable to show a relatively small number (e.g., 4 or 5).
- Although the sankey diagram in this example shows flows between different values of variables, sankey diagrams can be used to show many other types of flows (e.g., migration patterns, regression trees, and energy flows (see https://christophergandrud.github.io/networkD3/).
Acknowledgments
Uses a variant of the networkD3 htmlwidget, created by Kent Russell.