Correspondence analysis represents a table as a scatterplot, where the row and column names are shown on the chart.
Technical details
Description
Our blog contains several posts about correspondence analysis. Included are this introduction, this piece about interpretation of the output, and this more technical description.
Inputs
- Input table(s) The name of the table(s) containing data to be analyzed. Each table should only contain a single statistic (e.g., Total %). The statistic that is shown first will be used in the analysis. For example, if you have a table showing Total % and Column %, then Column % will be used (whereas Total % is the more orthodox choice). If multiple tables are selected, the correspondence analysis of each table will be shown on the same plot. The row and column names of each table selected must be identical.
- Note, that if the supplied table is a drag and drop table containing "Correlation" as the first statistic, then 1 is automatically added to each value in the table. This ensures that all values are positive, so that the assumptions for Correspondence Analysis are met.
- Paste or type table As an alternative to Input table(s), data can be manually entered or pasted. If this option is used, only a single table can be entered.
- Trend lines When multiple tables are used as input, there is an option to show trend lines between corresponding points across different tables.
- Switch rows and columns Whether or not to transpose the input data source.
- Output:
-
- Scatterplot
-
Bubble Chart
-
- Bubble sizes A numeric vector of sizes for the bubbles with names equal to the row labels.
- Bubble colors A numeric vector of values for with names equal to the row labels. A divergent color scale will be constructed using the range of the values as end points. The center of the colorscale can be either the median of the values, or zero. Bubbles will be colored according to the corresponding value. The colors at the ends of the colorscale can be specified in controls under the Chart tab.
- Bubble legend title Title of the legend showing bubble sizes.
-
- Moonplot
- Text produces output in standard coordinates
- Input Table
- Normalization The method used to normalize the coordinates of the correspondence analysis chart. This blog post explains the differences between the normalization option. Options are:
-
- Principal (default option) charts the principal coordinates (i.e., the standard coordinates multiplied by the singular values) for both rows and columns.
- Row principal charts rows in principal coordinates and columns in standard coordinates.
- Row principal (scaled) is as Row principal except columns are scaled by the first singular value so as to appear on a similar scale to rows.
- Column principal charts columns in principal coordinates and rows in standard coordinates.
- Column principal (scaled) is as Column principal except rows are scaled by the first singular value so as to appear on a similar scale to columns.
- Symmetrical (½) charts the standard coordinates multiplied by the square roots of the singular values for both rows and columns.
- None charts the standard coordinates for both rows and columns.
- Focus The label of a row or column to focus the output. The axes will be rotated so that the label lies along the first dimension. This means that the entirety of the variance due to the label is visible in a 2-dimensional plot. This is useful if the analysis is intended to explain the relationship between the focus label and all other labels, rather than the general relationship between all labels. Note that the first dimension will no longer explain the maximum amount of variance. The second dimension explains the maximum amount of remaining variance whilst remaining perpendicular to the first dimension.
- Supplementary A comma delimited list of rows and/or columns which are not used to fit the low-dimensional space, but are plotted in the space. This article describes the uses of supplementary points.
- Horizontal dimension, Vertical dimension The dimensions to plot on the horizontal and vertical axes respectively. Since dimensions are output in order of decreasing variance, the first and second dimensions are usually of most interest.
- Flip horizontally, Flip vertically Whether to reverse (i.e. invert the sign of) the output coordinates for the specified dimension. This may allow better visualization, especially when comparing maps that are similar apart from reflections.
- Rows to ignore, Columns to ignore The names of any rows or columns to be removed from the table prior to analysis.
- Use logos for rows When this option is selected, the user can replace the labels in the scatterplot with logos. The logos should be supplied as a comma-separated list of URLs.
- Maximum row labels to plot, Maximum column labels to plot These options limit the number of labels shown. It is useful when there are many points with overlapping labels. The remaining points will be shown without labels.
- Chart title Optional title for the scatterplot or bubble chart.
- Custom legend labels Labels used for the row and column in the legend.
- Row legend label, Column legend label Optional labels to be shown in the legend for the row and column projections on a scatter or bubble chart.
- Row series color, Column series color Color of the points shown in the labelled scatterplot or bubble chart for a single table.
- Color palette Control colors used for labelled scatterplot or bubble chart when multiple tables are used.
- Title font size Font size of the chart title.
- X-axis title font size Font size of the horizontal axis title.
- Y-axis title font size Font size of the vertical axis title.
- Labels font size Font size of the of the labels on the scatterplot.
- Axis labels font size Font size of the labels on the x- and y-axis.
- Legend font size Font size of the legend.
- Show gridlines Whether to display gridlines on the plot.
Additional options are available by editing the code.
DIAGNOSTICS
- Quality Table Creates an table containing measures of the quality of a correspondence analysis.
Output
Input Example: A crosstab or table with something in the rows and columns.
Example output Scatter: Correspondence analysis relating cola brands to their personalities. For points close together, you are able to move around the labels on the visualization. For guidance on how to interpret a correspondence map please see this article: How to Interpret Correspondence Analysis Plots (It Probably Isn’t the Way You Think).
Example output Text: The Text output shows some of the underlying detail from the model. The Principal inertias (eigenvalues) is the squared canonical correlation (the correlation between the different variable sets in the rows and columns within each dimension).
Additional Properties
When using this feature you can obtain additional information that is stored by inspecting it using custom R code in an item below:
#change YourReferenceName to the reference name (under Properties > General) of your analysis
item = YourReferenceName
str(item)
Properties which may be of interest are:
- Row coordinates:
-
-
item$row.coordinates # plot row coordinates
-
- Column coordinates:
-
-
item$column.coordinates # plot column coordinates
-
- Combine row and column coordinates into a single object:
-
-
# combined row/column coordinates
dimensions = rbind(item$row.coordinates,item$column.coordinates)
-
- Just take the first 2 dimensions (columns) (appropriate for export into a scatterplot):
-
-
dimensions[,1:2]
-
Acknowledgements
The R package ca is used to compute the correspondence analysis.
Method
- In Displayr: How to Do Traditional Correspondence Analysis
- In Q: How to Do Traditional Correspondence Analysis