Download April 22

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Transcript
Factor Analysis
Factor Analysis
I
Principal Components Analysis, e.g. of stock price movements,
sometimes suggests that several variables may be responding to a
small number of underlying forces.
I
In the factor model, we assume that such latent variables, or factors,
exist.
NC STATE UNIVERSITY
1 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
The Orthogonal Factor Model equation:
X1 − µ1 = l1,1 F1 + l1,2 F2 + · · · + l1,m Fm + 1 ,
X2 − µ2 = l2,1 F1 + l2,2 F2 + · · · + l2,m Fm + 2 ,
..
..
.
.
Xp − µp = lp,1 F1 + lp,2 F2 + · · · + lp,m Fm + p ,
where:
I
I
I
F1 , F2 , . . . , Fm are the common factors (latent variables);
li,j is the loading of variable i, Xi , on factor j, Fj ;
i is a specific factor, affecting only Xi .
NC STATE UNIVERSITY
2 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
In matrix form:
X − µ = LF + .
p×1
I
p×m×1
p×1
p×1
To make this identifiable, we further assume, with no loss of
generality:
E(F) = 0
m×1
Cov(F) =
I
m×m
E() = 0
p×1
Cov(, F) = 0
p×m
NC STATE UNIVERSITY
3 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
and with serious loss of generality:
Cov() = Ψ = diag (ψ1 , ψ2 , . . . , ψp ) .
I
In terms of the observable variables X, these assumptions mean that
E(X) = µ,
Cov(X) = Σ = L L0 + Ψ .
p×m×p
p×p
Usually X is standardized, so Σ = R.
I
The observable X and the unobservable F are related by
Cov(X, F) = L.
NC STATE UNIVERSITY
4 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
Some terminology: the (i, i) entry of the matrix equation
Σ = LL0 + Ψ is
2
2
2
σi,i = li,1
+ li,2
+ · · · + li,m
+
|{z}
{z
}
|
Var(Xi )
Communality
ψi
|{z}
,
Specific variance
or
σi,i = hi2 + ψi
where
2
2
2
+ li,2
+ · · · + li,m
hi2 = li,1
is the i th communality.
I
Note that if T is (m × m) orthogonal, then (LT)(LT)0 = LL0 , so
loadings LT generate the same Σ as L: loadings are not unique.
NC STATE UNIVERSITY
5 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
Existence of Factor Representation
I
For any p, every (p × p) Σ can be factorized as
Σ = LL0
for (p × p) L, which is a factor representation with m = p and
Ψ = 0; however, m = p is not much use–we usually want m p.
I
For p = 3, every (3 × 3) Σ can be represented as
Σ = LL0 + Ψ
for (3 × 1) L, which is a factor representation with m = 1, but Ψ may
have negative elements.
NC STATE UNIVERSITY
6 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
In general, we can only approximate Σ by LL0 + Ψ.
I
Principal components method: the spectral decomposition of Σ is
0
Σ = EΛE0 = EΛ1/2 EΛ1/2 = LL0
with m = p.
I
If λ1 + λ2 + · · · + λm λm+1 + · · · + λp , and L(m) is the first m
columns of L, then
0
Σ ≈ L(m) L(m)
gives such an approximation with Ψ = 0.
NC STATE UNIVERSITY
7 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
0
I
The remainder term Σ − L(m) L(m) is non-negative definite, so its
diagonal entries are non-negative ⇒ we can get a closer
approximation as
0
Σ ≈ L(m) L(m) + Ψ(m) ,
0
where Ψ(m) = diag Σ − L(m) L(m) .
I
SAS proc factor program and output:
proc factor data = all method = prin;
var cvx -- xom;
title ’Method = Principal Components’;
proc factor data = all method = prin nfact = 2 plot;
var cvx -- xom;
title ’Method = Principal Components, 2 factors’;
NC STATE UNIVERSITY
8 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
Principal Factor Solution
I
Recall the Orthogonal Factor Model
X = LF + which implies
Σ = LL0 + Ψ.
I
The m-factor Principal Component solution is to approximate Σ (or,
if we standardize the variables, R) by a rank-m matrix using the
spectral decomposition
Σ = λ1 e1 e01 + · · · + λm em e0m + λm+1 em+1 e0m+1 + · · · + λp ep e0p .
I
The first m terms give the best rank-m approximation to Σ.
NC STATE UNIVERSITY
9 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
We can sometimes achieve higher communalities (= diag (LL0 )) by
either:
I
I
specifying an initial estimate of the communalities
iterating the solution
or both.
I
Suppose we are working with R.
the reduced correlation matrix
 ∗2
h1
 r2,1

Rr = 
 ...

rp,1
NC STATE UNIVERSITY
Given initial communalities hi∗2 , form
r1,2 . . .
h2∗2 . . .
..
..
.
.
..
.
r
p,2
10 / 38
r1,p
r2,p
..
.
hp∗2



.


Statistics 784
Multivariate Analysis
Factor Analysis
I
Now use the spectral decomposition of Rr to find its best rank-m
approximation
Rr ≈ L∗r L∗r 0 .
I
New communalities are
h̃i∗2 =
m
X
∗2
li,j
.
j=1
I
Find Ψ by equating the diagonal terms:
ψ̃i∗ = 1 − h̃i∗2 ,
or
NC STATE UNIVERSITY
∗
Ψ̃ = I − diag L∗r L∗r 0 .
11 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
This is the Principal Factor solution.
I
The Principal Component solution is the special case where the initial
communalities are all 1.
In proc factor, use method = prin as for the Principal
Component solution, but also specify the initial communalities:
I
I
I
the priors = ... option on the proc factor statement specifies a
method, such as squared multiple correlations (priors = SMC);
the priors statement provides explicit numerical values.
NC STATE UNIVERSITY
12 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
SAS program and output:
proc factor data = all method = prin priors = smc;
title ’Method = Principal Factors’;
var cvx -- xom;
I
In this case, the communalities are smaller than for the Principal
Component solution.
NC STATE UNIVERSITY
13 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
Other choices for the priors option include:
I
I
I
I
MAX ⇒ maximum absolute correlation with any other variable;
ASMC ⇒ Adjusted SMC (adjusted to make their sum equal to the sum
of the maximum absolute correlations);
ONE ⇒ 1;
RANDOM ⇒ uniform on (0, 1).
NC STATE UNIVERSITY
14 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
Iterated Principal Factors
I
One issue with both Principal Components and Principal Factors:
I
I
if S or R is exactly in the form LL0 + Ψ (or, more likely, approximately
in that form), neither method produces L and Ψ (unless you specify
the true communalities).
Solution: iterate!
I
I
Use the new communalities as initial communalities to get another set
of Principal Factors.
Repeat until nothing much changes.
NC STATE UNIVERSITY
15 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
I
In proc factor, use method = prinit; may also specify the initial
communalities (default = ONE).
SAS program and output:
proc factor data = all method = prinit;
title ’Method = Iterated Principal Factors’;
var cvx -- xom;
I
The communalities are still smaller than for the Principal Component
solution, but larger than for Principal Factors.
NC STATE UNIVERSITY
16 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
Likelihood Methods
I
If we assume that X ∼ Np (µ, Σ) with Σ = LL0 + Ψ, we can fit by
maximum likelihood:
I
I
µ̂ = x̄;
L is not identified without a constraint (uniqueness condition) such as
L0 Ψ−1 L = diagonal;
I
still no closed form equation for L̂; numerical optimization required.
NC STATE UNIVERSITY
17 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
We can also test hypotheses about m with the likelihood ratio test
(Bartlett’s correction improves the χ2 approximation):
I
I
I
I
H0 : m = m0 ; HA : m > m0 ;
h
i
2
−2 × log likelihood ratio ∼ χ2 with 12 (p − m0 ) − p − m0 degrees of
freedom.
√
Degrees of freedom > 0 ⇐⇒ m0 < 12 2p + 1 − 8p + 1 .
E.g. for p = 5, m0 < 2.298 ⇒ m0 ≤ 2:
p
m0
degrees of freedom
5
5
5
0
1
2
10
5
1
NC STATE UNIVERSITY
18 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
In proc factor, use method = ml; may also specify the initial
communalities (default = SMC); SAS program and output:
proc factor data = all method = ml;
var cvx -- xom;
title ’Method = Maximum Likelihood’;
proc factor data = all method = ml heywood plot;
var cvx -- xom;
title ’Method = Maximum Likelihood with Heywood fixup’;
proc factor data = all method = ml ultraheywood plot;
var cvx -- xom;
title ’Method = Maximum Likelihood with Ultra-Heywood fixup’;
NC STATE UNIVERSITY
19 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
I
Note that the iteration can produce communalities > 1!
Two fixes:
I
I
use the Heywood option on the proc factor statement; caps the
communalities at 1;
use the UltraHeywood option on the proc factor statement; allows
the iteration to continue with communalities > 1.
NC STATE UNIVERSITY
20 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
Scaling and the Likelihood
I
If the maximum likelihood estimates for a data matrix X are L̂ and Ψ̂,
and
Y = XD
n×p
n×p×p
is a scaled data matrix, with the columns of X scaled by the entries of
the diagonal matrix D, then the maximum likelihood estimates for Y
are DL̂ and D2 Ψ̂.
I
That is, the mle’s are invariant to scaling:
Σ̂Y = DΣ̂X D.
NC STATE UNIVERSITY
21 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
Proof: LY (µ, Σ) = LX (D−1 µ, D−1 ΣD−1 ).
I
No distinction between covariance and correlation matrices.
NC STATE UNIVERSITY
22 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
Weighting and the Likelihood
I
Recall the uniqueness condition
L0 Ψ−1 L = ∆, diagonal.
I
Write
1
1
Σ∗ = Ψ− 2 ΣΨ− 2
1
1
= Ψ− 2 (LL0 + Ψ)Ψ− 2
0
1
1
= Ψ− 2 L Ψ− 2 L + Ip
= L∗ L∗ 0 + Ip .
I
Σ∗ is the weighted covariance matrix.
NC STATE UNIVERSITY
23 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
Here
1
L∗ = Ψ− 2 L
and
L∗ 0 L∗ = L0 Ψ−1 L = ∆.
I
Note:
Σ∗ L∗ = L∗ L∗ 0 L∗ + L∗
= L∗ ∆ + L∗
= L∗ (∆ + Im )
so the columns of L∗ are the (unnormalized) eigenvectors of Σ∗ , the
weighted covariance matrix.
NC STATE UNIVERSITY
24 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
Also
(Σ∗ − Ip )L∗ = L∗ ∆
so the columns of L∗ are also the eigenvectors of
1
1
Σ∗ − Ip = Ψ− 2 (Σ − Ψ)Ψ− 2 ,
the weighted reduced covariance matrix.
I
Since the likelihood analysis is transparent to scaling, the weighted
reduced correlation matrix gives essentially the same results as the
weighted reduced covariance matrix.
NC STATE UNIVERSITY
25 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
Factor Rotation
I
In the orthogonal factor model X − µ = LF + , factor loadings are
not always easily interpreted.
I
J&W (p 504):
Ideally, we should like to see a pattern of loadings such that
each variable loads highly on a single factor and has small to
moderate loadings on the remaining factors.
I
That is, each row of L should have a single large entry.
NC STATE UNIVERSITY
26 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
Recall from the corresponding equation
Σ = LL0 + Ψ
that L and LT give the same Σ for any orthogonal T.
I
We can choose T to make the rotated loadings LT more readily
interpreted.
I
Note that rotation changes neither Σ nor Ψ, and hence the
communalities are also unchanged.
NC STATE UNIVERSITY
27 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
The Varimax Criterion
I
Kaiser proposed a criterion that measures interpretability:
I
I
I
I
L̂ is some set of loadings with communalities ĥi2 , i = 1, 2, . . . , p;
L̂∗ is a set of rotated loadings, L̂∗ = L̂T;
∗
∗
l̃i,j
= l̂i,j
/ĥi are scaled loadings;
criterion is

!2 
p
p
m
X
X
X
1
1
∗2
∗4
.

V =
l̃i,j
l̃i,j
−
p
p
j=1
NC STATE UNIVERSITY
i=1
28 / 38
i=1
Statistics 784
Multivariate Analysis
Factor Analysis
I
∗2 in column i.
Note that the term in [ ]s is the variance of the l̃i,j
I
Making this variance large tends to produce two clusters of scaled
loadings, one of small values and one of large values.
So each column of the rotated loading matrix tends to contain:
I
I
I
a group of large loadings, which identify the variables associated with
the factor;
the remaining loadings are small.
NC STATE UNIVERSITY
29 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
I
Example: Weekly returns for the 30 Dow Industrials stocks from
January, 2005 to March, 2007 (115 returns).
R code to rotate Principal Components 2–10:
dowPrcomp = prcomp(dow, scale. = TRUE);
dowVmax = varimax(dowPrcomp$rotation[ , 2:10], normalize = FALSE);
loadings(dowVmax);
I
Note: when R prints the loadings, entries with absolute value below a
cutoff (default: 0.1) are printed as blanks, to draw attention to the
larger values.
NC STATE UNIVERSITY
30 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
Loadings:
PC2
AA
AIG -0.138
AXP
BA
-0.382
CAT -0.132
C
0.161
DD
DIS
GE
-0.139
PC3
PC4
PC5
PC6
PC7
0.158
-0.358
-0.124
0.245 0.198
-0.165
-0.166 -0.107 0.196
-0.171
0.408 0.139 -0.155
-0.203
-0.218
-0.104
0.101 -0.191 0.131 -0.486
-0.327 0.255 0.101
GM
HD
-0.120
0.175
HON -0.260
0.197
HPQ
IBM
-0.192 -0.155
INTC
JNJ -0.113 0.127 -0.620
JPM
0.139
0.174
KO
-0.110 -0.371
NC STATE UNIVERSITY
0.676
PC8
PC9
PC10
-0.239 -0.325 0.164
-0.211 0.221 0.257
-0.186
0.118
0.158
-0.277 0.152 -0.187
-0.265 0.136
0.117
0.180
0.108
0.306
0.134
0.217 -0.154
31 / 38
0.151
-0.144
0.142 0.123
0.664
0.125 -0.245 0.222
0.161 0.138
0.128
0.105
-0.321
-0.156
-0.129
Statistics 784
Multivariate Analysis
Factor Analysis
MCD
MMM
MO
MRK
0.136
MSFT
PFE -0.138
PG
0.110
T
0.527
UTX
VZ
WMT
XOM
-0.634
0.111
-0.618 -0.131
-0.608
-0.105
-0.113
0.662
0.505
-0.274
0.443
0.108
-0.123
-0.120
-0.212 0.165 -0.193 -0.382 -0.128
0.152
0.141
0.110
0.298
0.123 0.498
0.189
0.304 -0.120
-0.103
-0.199
0.107
0.222
0.101
NC STATE UNIVERSITY
0.206
0.416
-0.529
32 / 38
0.167
-0.137 -0.217
0.142
Statistics 784
Multivariate Analysis
Factor Analysis
I
I
In proc factor, use rotate = varimax; may also request plots
both before (preplot) and after (plot) rotation;
SAS program and output:
proc factor data = all method = prinit nfact = 2
rotate = varimax preplot plot out = stout;
title ’Method = Iterated Principal Factors with Varimax Rotation’;
var cvx -- xom;
NC STATE UNIVERSITY
33 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
Factor Scores
I
Interpretation of a factor analysis is usually based on the factor
loadings.
I
Sometimes we need the (estimated) values of the unobserved factors
for further analysis–the factor scores.
I
In Principal Components Analysis, typically the principal components
are used, scaled to have variance 1.
I
In other types of factor analysis, two methods are used.
NC STATE UNIVERSITY
34 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
Bartlett’s Weighted Least Squares
I
Suppose that in the equation
X − µ = LF + ,
L is known.
I
We can view the equation as a regression of X on L, with coefficients
F and heteroscedastic errors with variance matrix Ψ.
I
This suggests using
f̂ = L0 Ψ−1 L
−1
L0 Ψ−1 (x − µ)
to estimate F.
NC STATE UNIVERSITY
35 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
With L, Ψ, and µ replaced by estimates, and for the j th observation
xj , this gives
−1
−1
−1
L̂0 Ψ̂ (xj − x̄)
f̂j = L̂0 Ψ̂ L̂
as estimated values of the factors.
I
The sample mean of the scores is 0.
I
If the factor loadings are ML estimates, L̂0 Ψ̂ L̂ is a diagonal matrix
ˆ and the sample covariance matrix of the scores is
∆,
n ˆ −1 .
I+∆
n−1
−1
In particular, the sample correlations of the factor scores are zero.
NC STATE UNIVERSITY
36 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
Regression Method
I
The second method depends on the normal distribution assumption.
I
X and F have a joint multivariate normal distribution ⇒ the
conditional distribution of F given X is also multivariate normal.
I
Best Linear Unbiased Predictor is the conditional mean.
NC STATE UNIVERSITY
37 / 38
Statistics 784
Multivariate Analysis
Factor Analysis
I
This leads to
−1
f̂j = L̂0 L̂L̂0 + Ψ̂
(xj − x̄)
−1
−1
−1
L̂0 Ψ̂ (xj − x̄)
= I + L̂0 Ψ̂ L̂
I
The two methods are related by
−1 −1
f̂jLS = I + L̂0 Ψ̂ L̂
f̂jR .
I
In proc factor, use out = <data set name> on the proc factor
statement; proc factor uses the regression method.
NC STATE UNIVERSITY
38 / 38
Statistics 784
Multivariate Analysis