Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Factor Analysis Factor Analysis I Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. I In the factor model, we assume that such latent variables, or factors, exist. NC STATE UNIVERSITY 1 / 38 Statistics 784 Multivariate Analysis Factor Analysis I The Orthogonal Factor Model equation: X1 − µ1 = l1,1 F1 + l1,2 F2 + · · · + l1,m Fm + 1 , X2 − µ2 = l2,1 F1 + l2,2 F2 + · · · + l2,m Fm + 2 , .. .. . . Xp − µp = lp,1 F1 + lp,2 F2 + · · · + lp,m Fm + p , where: I I I F1 , F2 , . . . , Fm are the common factors (latent variables); li,j is the loading of variable i, Xi , on factor j, Fj ; i is a specific factor, affecting only Xi . NC STATE UNIVERSITY 2 / 38 Statistics 784 Multivariate Analysis Factor Analysis I In matrix form: X − µ = LF + . p×1 I p×m×1 p×1 p×1 To make this identifiable, we further assume, with no loss of generality: E(F) = 0 m×1 Cov(F) = I m×m E() = 0 p×1 Cov(, F) = 0 p×m NC STATE UNIVERSITY 3 / 38 Statistics 784 Multivariate Analysis Factor Analysis I and with serious loss of generality: Cov() = Ψ = diag (ψ1 , ψ2 , . . . , ψp ) . I In terms of the observable variables X, these assumptions mean that E(X) = µ, Cov(X) = Σ = L L0 + Ψ . p×m×p p×p Usually X is standardized, so Σ = R. I The observable X and the unobservable F are related by Cov(X, F) = L. NC STATE UNIVERSITY 4 / 38 Statistics 784 Multivariate Analysis Factor Analysis I Some terminology: the (i, i) entry of the matrix equation Σ = LL0 + Ψ is 2 2 2 σi,i = li,1 + li,2 + · · · + li,m + |{z} {z } | Var(Xi ) Communality ψi |{z} , Specific variance or σi,i = hi2 + ψi where 2 2 2 + li,2 + · · · + li,m hi2 = li,1 is the i th communality. I Note that if T is (m × m) orthogonal, then (LT)(LT)0 = LL0 , so loadings LT generate the same Σ as L: loadings are not unique. NC STATE UNIVERSITY 5 / 38 Statistics 784 Multivariate Analysis Factor Analysis Existence of Factor Representation I For any p, every (p × p) Σ can be factorized as Σ = LL0 for (p × p) L, which is a factor representation with m = p and Ψ = 0; however, m = p is not much use–we usually want m p. I For p = 3, every (3 × 3) Σ can be represented as Σ = LL0 + Ψ for (3 × 1) L, which is a factor representation with m = 1, but Ψ may have negative elements. NC STATE UNIVERSITY 6 / 38 Statistics 784 Multivariate Analysis Factor Analysis I In general, we can only approximate Σ by LL0 + Ψ. I Principal components method: the spectral decomposition of Σ is 0 Σ = EΛE0 = EΛ1/2 EΛ1/2 = LL0 with m = p. I If λ1 + λ2 + · · · + λm λm+1 + · · · + λp , and L(m) is the first m columns of L, then 0 Σ ≈ L(m) L(m) gives such an approximation with Ψ = 0. NC STATE UNIVERSITY 7 / 38 Statistics 784 Multivariate Analysis Factor Analysis 0 I The remainder term Σ − L(m) L(m) is non-negative definite, so its diagonal entries are non-negative ⇒ we can get a closer approximation as 0 Σ ≈ L(m) L(m) + Ψ(m) , 0 where Ψ(m) = diag Σ − L(m) L(m) . I SAS proc factor program and output: proc factor data = all method = prin; var cvx -- xom; title ’Method = Principal Components’; proc factor data = all method = prin nfact = 2 plot; var cvx -- xom; title ’Method = Principal Components, 2 factors’; NC STATE UNIVERSITY 8 / 38 Statistics 784 Multivariate Analysis Factor Analysis Principal Factor Solution I Recall the Orthogonal Factor Model X = LF + which implies Σ = LL0 + Ψ. I The m-factor Principal Component solution is to approximate Σ (or, if we standardize the variables, R) by a rank-m matrix using the spectral decomposition Σ = λ1 e1 e01 + · · · + λm em e0m + λm+1 em+1 e0m+1 + · · · + λp ep e0p . I The first m terms give the best rank-m approximation to Σ. NC STATE UNIVERSITY 9 / 38 Statistics 784 Multivariate Analysis Factor Analysis I We can sometimes achieve higher communalities (= diag (LL0 )) by either: I I specifying an initial estimate of the communalities iterating the solution or both. I Suppose we are working with R. the reduced correlation matrix ∗2 h1 r2,1 Rr = ... rp,1 NC STATE UNIVERSITY Given initial communalities hi∗2 , form r1,2 . . . h2∗2 . . . .. .. . . .. . r p,2 10 / 38 r1,p r2,p .. . hp∗2 . Statistics 784 Multivariate Analysis Factor Analysis I Now use the spectral decomposition of Rr to find its best rank-m approximation Rr ≈ L∗r L∗r 0 . I New communalities are h̃i∗2 = m X ∗2 li,j . j=1 I Find Ψ by equating the diagonal terms: ψ̃i∗ = 1 − h̃i∗2 , or NC STATE UNIVERSITY ∗ Ψ̃ = I − diag L∗r L∗r 0 . 11 / 38 Statistics 784 Multivariate Analysis Factor Analysis I This is the Principal Factor solution. I The Principal Component solution is the special case where the initial communalities are all 1. In proc factor, use method = prin as for the Principal Component solution, but also specify the initial communalities: I I I the priors = ... option on the proc factor statement specifies a method, such as squared multiple correlations (priors = SMC); the priors statement provides explicit numerical values. NC STATE UNIVERSITY 12 / 38 Statistics 784 Multivariate Analysis Factor Analysis I SAS program and output: proc factor data = all method = prin priors = smc; title ’Method = Principal Factors’; var cvx -- xom; I In this case, the communalities are smaller than for the Principal Component solution. NC STATE UNIVERSITY 13 / 38 Statistics 784 Multivariate Analysis Factor Analysis I Other choices for the priors option include: I I I I MAX ⇒ maximum absolute correlation with any other variable; ASMC ⇒ Adjusted SMC (adjusted to make their sum equal to the sum of the maximum absolute correlations); ONE ⇒ 1; RANDOM ⇒ uniform on (0, 1). NC STATE UNIVERSITY 14 / 38 Statistics 784 Multivariate Analysis Factor Analysis Iterated Principal Factors I One issue with both Principal Components and Principal Factors: I I if S or R is exactly in the form LL0 + Ψ (or, more likely, approximately in that form), neither method produces L and Ψ (unless you specify the true communalities). Solution: iterate! I I Use the new communalities as initial communalities to get another set of Principal Factors. Repeat until nothing much changes. NC STATE UNIVERSITY 15 / 38 Statistics 784 Multivariate Analysis Factor Analysis I I In proc factor, use method = prinit; may also specify the initial communalities (default = ONE). SAS program and output: proc factor data = all method = prinit; title ’Method = Iterated Principal Factors’; var cvx -- xom; I The communalities are still smaller than for the Principal Component solution, but larger than for Principal Factors. NC STATE UNIVERSITY 16 / 38 Statistics 784 Multivariate Analysis Factor Analysis Likelihood Methods I If we assume that X ∼ Np (µ, Σ) with Σ = LL0 + Ψ, we can fit by maximum likelihood: I I µ̂ = x̄; L is not identified without a constraint (uniqueness condition) such as L0 Ψ−1 L = diagonal; I still no closed form equation for L̂; numerical optimization required. NC STATE UNIVERSITY 17 / 38 Statistics 784 Multivariate Analysis Factor Analysis I We can also test hypotheses about m with the likelihood ratio test (Bartlett’s correction improves the χ2 approximation): I I I I H0 : m = m0 ; HA : m > m0 ; h i 2 −2 × log likelihood ratio ∼ χ2 with 12 (p − m0 ) − p − m0 degrees of freedom. √ Degrees of freedom > 0 ⇐⇒ m0 < 12 2p + 1 − 8p + 1 . E.g. for p = 5, m0 < 2.298 ⇒ m0 ≤ 2: p m0 degrees of freedom 5 5 5 0 1 2 10 5 1 NC STATE UNIVERSITY 18 / 38 Statistics 784 Multivariate Analysis Factor Analysis I In proc factor, use method = ml; may also specify the initial communalities (default = SMC); SAS program and output: proc factor data = all method = ml; var cvx -- xom; title ’Method = Maximum Likelihood’; proc factor data = all method = ml heywood plot; var cvx -- xom; title ’Method = Maximum Likelihood with Heywood fixup’; proc factor data = all method = ml ultraheywood plot; var cvx -- xom; title ’Method = Maximum Likelihood with Ultra-Heywood fixup’; NC STATE UNIVERSITY 19 / 38 Statistics 784 Multivariate Analysis Factor Analysis I I Note that the iteration can produce communalities > 1! Two fixes: I I use the Heywood option on the proc factor statement; caps the communalities at 1; use the UltraHeywood option on the proc factor statement; allows the iteration to continue with communalities > 1. NC STATE UNIVERSITY 20 / 38 Statistics 784 Multivariate Analysis Factor Analysis Scaling and the Likelihood I If the maximum likelihood estimates for a data matrix X are L̂ and Ψ̂, and Y = XD n×p n×p×p is a scaled data matrix, with the columns of X scaled by the entries of the diagonal matrix D, then the maximum likelihood estimates for Y are DL̂ and D2 Ψ̂. I That is, the mle’s are invariant to scaling: Σ̂Y = DΣ̂X D. NC STATE UNIVERSITY 21 / 38 Statistics 784 Multivariate Analysis Factor Analysis I Proof: LY (µ, Σ) = LX (D−1 µ, D−1 ΣD−1 ). I No distinction between covariance and correlation matrices. NC STATE UNIVERSITY 22 / 38 Statistics 784 Multivariate Analysis Factor Analysis Weighting and the Likelihood I Recall the uniqueness condition L0 Ψ−1 L = ∆, diagonal. I Write 1 1 Σ∗ = Ψ− 2 ΣΨ− 2 1 1 = Ψ− 2 (LL0 + Ψ)Ψ− 2 0 1 1 = Ψ− 2 L Ψ− 2 L + Ip = L∗ L∗ 0 + Ip . I Σ∗ is the weighted covariance matrix. NC STATE UNIVERSITY 23 / 38 Statistics 784 Multivariate Analysis Factor Analysis I Here 1 L∗ = Ψ− 2 L and L∗ 0 L∗ = L0 Ψ−1 L = ∆. I Note: Σ∗ L∗ = L∗ L∗ 0 L∗ + L∗ = L∗ ∆ + L∗ = L∗ (∆ + Im ) so the columns of L∗ are the (unnormalized) eigenvectors of Σ∗ , the weighted covariance matrix. NC STATE UNIVERSITY 24 / 38 Statistics 784 Multivariate Analysis Factor Analysis I Also (Σ∗ − Ip )L∗ = L∗ ∆ so the columns of L∗ are also the eigenvectors of 1 1 Σ∗ − Ip = Ψ− 2 (Σ − Ψ)Ψ− 2 , the weighted reduced covariance matrix. I Since the likelihood analysis is transparent to scaling, the weighted reduced correlation matrix gives essentially the same results as the weighted reduced covariance matrix. NC STATE UNIVERSITY 25 / 38 Statistics 784 Multivariate Analysis Factor Analysis Factor Rotation I In the orthogonal factor model X − µ = LF + , factor loadings are not always easily interpreted. I J&W (p 504): Ideally, we should like to see a pattern of loadings such that each variable loads highly on a single factor and has small to moderate loadings on the remaining factors. I That is, each row of L should have a single large entry. NC STATE UNIVERSITY 26 / 38 Statistics 784 Multivariate Analysis Factor Analysis I Recall from the corresponding equation Σ = LL0 + Ψ that L and LT give the same Σ for any orthogonal T. I We can choose T to make the rotated loadings LT more readily interpreted. I Note that rotation changes neither Σ nor Ψ, and hence the communalities are also unchanged. NC STATE UNIVERSITY 27 / 38 Statistics 784 Multivariate Analysis Factor Analysis The Varimax Criterion I Kaiser proposed a criterion that measures interpretability: I I I I L̂ is some set of loadings with communalities ĥi2 , i = 1, 2, . . . , p; L̂∗ is a set of rotated loadings, L̂∗ = L̂T; ∗ ∗ l̃i,j = l̂i,j /ĥi are scaled loadings; criterion is !2 p p m X X X 1 1 ∗2 ∗4 . V = l̃i,j l̃i,j − p p j=1 NC STATE UNIVERSITY i=1 28 / 38 i=1 Statistics 784 Multivariate Analysis Factor Analysis I ∗2 in column i. Note that the term in [ ]s is the variance of the l̃i,j I Making this variance large tends to produce two clusters of scaled loadings, one of small values and one of large values. So each column of the rotated loading matrix tends to contain: I I I a group of large loadings, which identify the variables associated with the factor; the remaining loadings are small. NC STATE UNIVERSITY 29 / 38 Statistics 784 Multivariate Analysis Factor Analysis I I Example: Weekly returns for the 30 Dow Industrials stocks from January, 2005 to March, 2007 (115 returns). R code to rotate Principal Components 2–10: dowPrcomp = prcomp(dow, scale. = TRUE); dowVmax = varimax(dowPrcomp$rotation[ , 2:10], normalize = FALSE); loadings(dowVmax); I Note: when R prints the loadings, entries with absolute value below a cutoff (default: 0.1) are printed as blanks, to draw attention to the larger values. NC STATE UNIVERSITY 30 / 38 Statistics 784 Multivariate Analysis Factor Analysis Loadings: PC2 AA AIG -0.138 AXP BA -0.382 CAT -0.132 C 0.161 DD DIS GE -0.139 PC3 PC4 PC5 PC6 PC7 0.158 -0.358 -0.124 0.245 0.198 -0.165 -0.166 -0.107 0.196 -0.171 0.408 0.139 -0.155 -0.203 -0.218 -0.104 0.101 -0.191 0.131 -0.486 -0.327 0.255 0.101 GM HD -0.120 0.175 HON -0.260 0.197 HPQ IBM -0.192 -0.155 INTC JNJ -0.113 0.127 -0.620 JPM 0.139 0.174 KO -0.110 -0.371 NC STATE UNIVERSITY 0.676 PC8 PC9 PC10 -0.239 -0.325 0.164 -0.211 0.221 0.257 -0.186 0.118 0.158 -0.277 0.152 -0.187 -0.265 0.136 0.117 0.180 0.108 0.306 0.134 0.217 -0.154 31 / 38 0.151 -0.144 0.142 0.123 0.664 0.125 -0.245 0.222 0.161 0.138 0.128 0.105 -0.321 -0.156 -0.129 Statistics 784 Multivariate Analysis Factor Analysis MCD MMM MO MRK 0.136 MSFT PFE -0.138 PG 0.110 T 0.527 UTX VZ WMT XOM -0.634 0.111 -0.618 -0.131 -0.608 -0.105 -0.113 0.662 0.505 -0.274 0.443 0.108 -0.123 -0.120 -0.212 0.165 -0.193 -0.382 -0.128 0.152 0.141 0.110 0.298 0.123 0.498 0.189 0.304 -0.120 -0.103 -0.199 0.107 0.222 0.101 NC STATE UNIVERSITY 0.206 0.416 -0.529 32 / 38 0.167 -0.137 -0.217 0.142 Statistics 784 Multivariate Analysis Factor Analysis I I In proc factor, use rotate = varimax; may also request plots both before (preplot) and after (plot) rotation; SAS program and output: proc factor data = all method = prinit nfact = 2 rotate = varimax preplot plot out = stout; title ’Method = Iterated Principal Factors with Varimax Rotation’; var cvx -- xom; NC STATE UNIVERSITY 33 / 38 Statistics 784 Multivariate Analysis Factor Analysis Factor Scores I Interpretation of a factor analysis is usually based on the factor loadings. I Sometimes we need the (estimated) values of the unobserved factors for further analysis–the factor scores. I In Principal Components Analysis, typically the principal components are used, scaled to have variance 1. I In other types of factor analysis, two methods are used. NC STATE UNIVERSITY 34 / 38 Statistics 784 Multivariate Analysis Factor Analysis Bartlett’s Weighted Least Squares I Suppose that in the equation X − µ = LF + , L is known. I We can view the equation as a regression of X on L, with coefficients F and heteroscedastic errors with variance matrix Ψ. I This suggests using f̂ = L0 Ψ−1 L −1 L0 Ψ−1 (x − µ) to estimate F. NC STATE UNIVERSITY 35 / 38 Statistics 784 Multivariate Analysis Factor Analysis I With L, Ψ, and µ replaced by estimates, and for the j th observation xj , this gives −1 −1 −1 L̂0 Ψ̂ (xj − x̄) f̂j = L̂0 Ψ̂ L̂ as estimated values of the factors. I The sample mean of the scores is 0. I If the factor loadings are ML estimates, L̂0 Ψ̂ L̂ is a diagonal matrix ˆ and the sample covariance matrix of the scores is ∆, n ˆ −1 . I+∆ n−1 −1 In particular, the sample correlations of the factor scores are zero. NC STATE UNIVERSITY 36 / 38 Statistics 784 Multivariate Analysis Factor Analysis Regression Method I The second method depends on the normal distribution assumption. I X and F have a joint multivariate normal distribution ⇒ the conditional distribution of F given X is also multivariate normal. I Best Linear Unbiased Predictor is the conditional mean. NC STATE UNIVERSITY 37 / 38 Statistics 784 Multivariate Analysis Factor Analysis I This leads to −1 f̂j = L̂0 L̂L̂0 + Ψ̂ (xj − x̄) −1 −1 −1 L̂0 Ψ̂ (xj − x̄) = I + L̂0 Ψ̂ L̂ I The two methods are related by −1 −1 f̂jLS = I + L̂0 Ψ̂ L̂ f̂jR . I In proc factor, use out = <data set name> on the proc factor statement; proc factor uses the regression method. NC STATE UNIVERSITY 38 / 38 Statistics 784 Multivariate Analysis