Principal Component Analysis

by Irina 5. May 2007 01:49

The Basics of Principal Component Analysis

Principal component analysis is appropriate when you have obtained measures on a number of observed variables and wish to develop a smaller number of artificial variables (called principal components) that will account for most of the variance in the observed variables.and believe that there is some redundancy in those variables. In this case, redundancy means that some of the variables are correlated with one another, possibly because they are measuring the same construct.The principal components may then be used as predictor or criterion variables in subsequent analyses

Because it is a variable reduction procedure, principal component analysis is similar in many respects to exploratory factor analysis. In fact, the steps followed when conducting a principal component analysis are virtually identical to those followed when conducting an exploratory factor analysis. However, there are significant conceptual differences between the two procedures, and it is important that you do not mistakenly claim that you are performing factor analysis when you are actually performing principal component analysis

What is a Principal Component?

How principal components are computed. Technically, a principal component can be defined as a linear combination of optimally-weighted observed variables. In order to understand the meaning of this definition, it is necessary to first describe how subject scores on a principal component are computed.It is possible to calculate a score for each subject on a given principal component. For example, in the preceding study, each subject would have scores on two components: one score on the satisfaction with supervision component, and one score on the satisfaction with pay component. The subject’s actual scores on the seven questionnaire items would be optimally weighted and then summed to compute their scores on a given component.

For example, assume that component 1 in the present study was the “satisfaction with supervision” component. You could determine each subject’s score on principal component 1 by using the following fictitious formula:
C1 = .44 (X1) + .40 (X2) + .47 (X3) + .32 (X4) + .02 (X5) + .01 (X6) + .03 (X7)

The SAS System’s PROC FACTOR solves for these weights by using a special type of equation called an eigenequation. The weights produced by these eigenequations are optimal weights in the sense that, for a given set of data, no other set of weights could produce a set of components that are more successful in accounting for variance in the observed variables. The weights are created so as to satisfy a principle of least squares similar (but not identical) to the principle of least squares used in multiple regression.

Number of components extracted.

In reality, the number of components extracted in a principal component analysis is equal to the number of observed variables being analyzed. However, in most analyses, only the first few components account for meaningful amounts of variance, so only these first few components are retained, interpreted, and used in subsequent analyses (such as in multiple regression analyses).

What is meant by “total variance” in the data set?

The “total variance” in the data set is simply the sum of the variances of these observed variables. Because they have been standardized the total variance in a principal component analysis will always be equal to the number of observed variables being analyzed

Principal Component Analysis is Not Factor Analysis !

Both procedures can be performed with the SAS System’s FACTOR procedure, and they sometimes even provide very similar results. But factor analysis assumes that the covariation in the observed variables is due to the presence of one or more latent variables (factors) that exert causal influence on these observed variables. And in contrast, principal component analysis makes no assumption about an underlying causal model. Principal component analysis is simply a variable reduction procedure that (typically) results in a relatively small number of components that account for most of the variance in a set of observed variables.

What is a communality?

A communality refers to the percent of variance in an observed variable that is accounted for by the retained components (or factors).

SAS Program and Output.

You may perform a principal component analysis using either the PRINCOMP or FACTOR procedures.
PROC FACTOR DATA=data-set-name
PREPLOT PLOT
SIMPLE
METHOD=PRIN
PRIORS=ONE
MINEIGEN=p
SCREE
ROTATE=VARIMAX
ROUND
FLAG=desired-size-of-"significant"-factor-loadings ;
VAR variables-to-be-analyzed ;
RUN;
FLAG=desired-size-of-”significant”-factor-loadings causes the printer to flag (with an asterisk) any factor loading whose absolute value is greater than some specified size.
METHOD=factor-extraction-method specifies the method to be used in extracting the factors or components. The current program specifies METHOD=PRIN to request that the principal axis (principal factors) method be used for the initial extraction. This is the appropriate method for a principal component analysis
PREPLOT option will show us a factor plot before rotation.
PLOT option will show us a factor plot after rotation
MINEIGEN=p specifies the critical eigenvalue a component must display if that component is to be retained.This statement will cause PROC FACTOR to retain and rotate any component whose eigenvalue is p or larger. Negative values are not allowed. (here, p = the critical eigenvalue).
NFACT=n  allows you to specify the number of components to be retained and rotated, where n = the number of components.
PRIORS=prior-communality-estimates specifies prior communality estimates. Users should always specify PRIORS=ONE to perform a principal component analysis.
ROTATE=rotation-method specifies the rotation method to be used. The preceding program requests a varimax rotation, which results in orthogonal (uncorrelated) components.
ROUND causes all coefficients to be limited to two decimal places, rounded to the nearest integer, and multiplied by 100 (thus eliminating the decimal point).
SIMPLE requests simple descriptive statistics: the number of usable cases on which the analysis was performed, and the means and standard deviations of the observed variables.

Related posts

Comments

5/15/2011 4:17:04 AM

nice post really informative and useful love your site, looking forward for more

Green Tea Fat Burner

5/18/2011 5:11:17 AM

thats really a nice and informative post ,you are doing an excellent job with your site let it keep coming.

green tea weight loss

11/1/2011 10:17:59 PM

Using the help, We've regained her self-belief, all this website groundwork presents again develop controllable so as to compete around the exterior world.

bonus up to visita

Add comment


(will show your Gravatar icon)  





Live preview

2/5/2012 9:10:08 PM

 

About the author

Irina Spivak Irina Spivak
Team Leader at G-Stat. More...


Send mail Email

Blogroll

    Disclaimer

    The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

    © Copyright 2012

    Sign in

    eXTReMe Tracker