If infolevel[Statistics] is set to 1, the PCA command will return a printed summary for the results.
>
|
|
>
|
|
>
|
|
Values proportion of variance St. Deviation
11.2240 0.9904 3.3502
0.1093 0.0096 0.3306
| |
The rotation matrix from the principal component analysis:
The principal components can be returned using :-principalcomponents.
>
|
|
The following plot shows the original data set (in red) and the results from the principal component analysis.
>
|
|
One use for principal component analysis is to eliminate dimensions from the data. The following plot shows the original two dimensions of data, X and Y, as well as the two resulting principal components. It can be observed that from the principal component analysis, the 2nd component has the least effect on the variance, suggesting that it can be removed.
>
|
|
>
|
|
>
|
|
>
|
|
The following example performs a principal component analysis on multi-dimensional data. The components that have the least impact on the variance are discarded, and the simplified data is reconstructed from the remaining components.
>
|
|
The columns option keeps the columns of data with the greatest variance. Here we discard the component with the least amount of impact on the variability of the dataset.
>
|
|
Values proportion of variance St. Deviation
8.3208 0.8782 2.8846
1.1119 0.1173 1.0545
0.0425 0.0045 0.2061
| |
>
|
|
The data can be reconstructed using the principal components:
>
|
|
>
|
|
>
|
|
>
|
|
The correlation option is used to compute the principal components using the correlation matrix instead of the covariance matrix. This is often done while using the eigenvector method.
>
|
|
A Scree Plot is often used to visually determine which principal components explain the majority of the variance.
>
|
|
>
|
|
Values proportion of variance St. Deviation
31.4897 0.6658 5.6116
13.2270 0.2797 3.6369
2.3575 0.0498 1.5354
0.2180 0.0046 0.4669
0.0031 0.0001 0.0558
| |
The following plot indicates that the first three components account for approximately 99.5% of the variance.
The tolerance or removecolumns options can be used to remove the components with the least effect on the overall variance. Using tolerance = 0.01 will remove any principal components whose value is at most 0.01 multiplied by the value of the first principal component, namely the last two.
>
|
|
A Biplot can also be used to show the first two components and the observations on the same set of axes. The first principal component is plotted on the x-axis and the second on the y-axis.
>
|
|