Dimension Reduction - Principal Components Analysis – Technical Documentation

Principal Components Analysis is a tool for reducing a large set of variables to a smaller set of variables while retaining as much of the variation in the original data set as possible. See this blog post for an introduction. The new variables for the reduced data set (which contain the component scores) can be saved by running Dimension Reduction - Save Variable(s) - Components/Dimensions or by clicking on the button Data > Save variables. The new variables can then be used in subsequent analyses.

pca loadings.png

Technical details

Inputs

Variables The variables or a question containing variables that you would like to analyze.

Use correlation matrix If this is true, then the correlation matrix of the data in Variables will be used to conduct the PCA. Otherwise, the covariance matrix is used.

Create binary variables from categories Represents unordered categorical variables as binary variables. Otherwise, their Value Attributes are used. Number - Multi questions are treated according to their numeric values and not converted to binary.

Rule for selecting components Method for determining the number of principal components to keep in the analysis:

Kaiser rule Keep components with eigenvalues greater than 1. If the unscaled covariance matrix is used instead of the correlation matrix, components with eigenvalues greater than the mean eigenvalue are kept.

Eigenvalue over Keep components with eigenvalues greater than a user-specified number. If the unscaled covariance matrix is used instead of the correlation matrix, components with eigenvalues greater than a multiple of the eigenvalue mean are kept.

Number of components Manually select the number of components to keep.

Rotation method (see below for more details):

None

Varimax

Quartimax

Equamax

Oblimin

Promax

Delta (Oblimin rotation) A parameter used when performing an Oblimin rotation. The default value is 0.

Kappa (Promax rotation) A parameter used when performing a Promax rotation. The default value is 4.

Missing data See Missing Data Options.

Output

Loadings Table Display a table of the component loadings, which is sometimes referred to as a Pattern matrix.

Structure Matrix Display the structure matrix, which is the loadings matrix multiplied by the correlations between the components.

Variance Explained Display the eigenvalues of the original, unrotated components, along with the variance explained, and cumulative variance explained.

Component Plot Display a scatterplot of the loadings of the first two principal components.

Scree Plot Display a chart of the eigenvalues of the correlation or covariance matrix.

Detailed Output Show more details on the results, including the loadings, structure matrix, variable communalities, sum of squared loadings, and score weights.

2D Scatterplot Show the data charted with axes of the first 2 components and labelled according to Grouping Variable.

Sort coefficients by size When displaying loadings or the structure matrix sort the components according to their size.

Suppress small loadings When displaying loadings or the structure matrix, replace small values with blank spaces to facilitate interpretation.

Absolute value below In tables, cells which have absolute values smaller than this will be replaced with blank spaces.

Include labels in plots Whether or not the variable labels will be included in the Component Plot.

Variable names Displays Variable Names in the output instead of variable labels.

Group variable The variable to group (i.e. label) the points of a 2D Scatterplot.

Rotations

Rotations of the principal components are used to produce solutions where the loadings tend to be closer to 0, 1, or -1, making interpretation of the solution easier.

The Varimax, Quartimax, and Equamax rotations are orthogonal, which means that the components produced are always uncorrelated with one another.

The Promax and Oblimin rotations are oblique, meaning that the components can be correlated with one another.

After rotation, components with large negative loadings will have signs flipped, so that the largest loadings are positive, to make interpretation easier.

Scores

Principal components analysis can be used to create a new set of variables which give the new values for each case on the components that have been identified. Here, this is done using the Regression method. The coefficients for transforming the original variables to the new set of scores are shown in the Detailed Output under Score coefficient matrix, and the new variables can be saved to your data set using Dimension Reduction - Save Variable(s) or by clicking on the button Data > Save variables.

Diagnostics

Test - Bartlett Test of Sphericity can be used to test whether or not the input variables are correlated with one another before conducting the principal components analysis.

Output

There are different outputs under the Data > Output dropdown.

Example Output - Loadings Table

The main output is the Loadings Table, which shows the loadings of each of the original variables on the components that have been identified, along with the amount of variance explained by the components. In this example, two components have been obtained from the preference scores for the six cola brands. Component 1 is most strongly correlated with the scores for the diet drinks, and Component 2 is most strongly correlated with the scores for the full-sugar drinks.

pca loadings.png

Example Output - Detailed Output

To get more detail, change the Output to Detailed Output, which shows more of the underlying metrics associated with the analysis.

Principal Components Analysis

Input: Correlation matrix
Missing data setting: Use partial data (pairwise correlations)
Sample size: 327 to 327
Rotation: Varimax

Rotated loadings:
                           Component 1 Component 2
Brand attitude: Diet Pepsi  0.767                 
Brand attitude: Pepsi Max   0.739                 
Brand attitude: Diet Coke   0.715                 
Brand attitude: Coke Zero   0.667                 
Brand attitude: Pepsi                   0.848     
Brand attitude: Coca-Cola               0.761     

                       Component 1 Component 2
Sum of Square Loadings       2.165       1.475
% of Variance               36.089      24.579
Cumulative %                36.089      60.668


Communalities:
                           Initial Extraction
Brand attitude: Coca-Cola        1      0.597
Brand attitude: Diet Coke        1      0.592
Brand attitude: Coke Zero        1      0.448
Brand attitude: Pepsi            1      0.775
Brand attitude: Diet Pepsi       1      0.623
Brand attitude: Pepsi Max        1      0.605


Score Coefficient Matrix:
                           Component 1 Component 2
Brand attitude: Coca-Cola       -0.106       0.529
Brand attitude: Diet Coke        0.350      -0.236
Brand attitude: Coke Zero        0.315      -0.075
Brand attitude: Pepsi            0.062       0.567
Brand attitude: Diet Pepsi       0.347       0.083
Brand attitude: Pepsi Max        0.331       0.123

Example Output - Variance Explained

To see the Variance Explained of all of the components, change the Output to Variance Explained.

PCA variance explained.png

Additional Properties

When using this feature, you can obtain additional information that is stored by inspecting it using custom R code in an item below:

#change YourReferenceName to the reference name (under Properties > General) of your analysis
item = YourReferenceName
str(item)

Properties which may be of interest are:

Loadings:

item$loadings

Scores:

item$scores

If the Output is set to 2D Scatterplot, you can create a table to plot the scores for other components to be used in a separate scatterplot as follows:

#CHANGE the principal.components.analysis below to the reference name of your PCA
yourpca=principal.components.analysis

#CHANGE the components to the two that you want
thecomps=yourpca$scores[,c("Component 2","Component 3")]

#pull in the default chart data to get the appropriate groups
currchart=attr(yourpca, "ChartData")

#combine the new components with the groups and dummy size column
finalchart=data.frame(thecomps,
                 Size=rep(.7,NROW(thecomps)),
                 Color=as.character(currchart$Group))
finalchart

Acknowledgements

The R package psych is used to extract the original, unrotated components from the input data.

The R package GPArotation is used to conduct rotations.

Method

In Displayr: How to Do Principal Component Analysis in Displayr
In Q: How to Do a Principal Components Analysis in Q

Articles in this section