Download singular values = 0

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Object Orie’d Data Analysis, Last Time
• Gene Cell Cycle Data
• Microarrays and HDLSS visualization
• DWD bias adjustment
• NCI 60 Data
Today: More NCI 60 Data &
Detailed (math’cal) look at PCA
Last Time: Checked Data
Combo, using DWD Dir’ns
DWD Views of NCI 60 Data
Interesting Question:
Which clusters are really there?
Issues:
• DWD great at finding dir’ns of separation
• And will do so even if no real structure
• Is this happening here?
• Or: which clusters are important?
• What does “important” mean?
Real Clusters in NCI 60 Data
Simple Visual Approach:
• Randomly relabel data (Cancer Types)
• Recompute DWD dir’ns & visualization
• Get heuristic impression from this
Deeper Approach
• Formal Hypothesis Testing
(Done later)
Random Relabelling #1
Random Relabelling #2
Random Relabelling #3
Random Relabelling #4
Revisit Real Data
Revisit Real Data (Cont.)
Heuristic Results:
Strong Clust’s
Weak Clust’s
Not Clust’s
Melanoma
CNS
NSCLC
Leukemia
Ovarian
Breast
Renal
Colon
Later: will find way to quantify these ideas
i.e. develop statistical significance
NCI 60 Controversy
• Can NCI 60 Data be normalized?
• Negative Indication:
• Kou, et al (2002) Bioinformatics, 18, 405412.
– Based on Gene by Gene Correlations
• Resolution:
Gene by Gene Data View
vs.
Multivariate Data View
Resolution of Paradox: Toy
Data, Gene View
Resolution: Correlations
suggest “no chance”
Resolution: Toy Data, PCA View
Resolution: PCA & DWD direct’ns
Resolution: DWD Adjusted
Resolution: DWD Adjusted,
PCA view
Resolution: DWD Adjusted,
Gene view
Resolution: Correlations &
PC1 Projection Correl’n
Needed final verification of
Cross-platform Normal’n
• Is statistical power actually improved?
• Will study later
DWD: Why does it work?
Rob Tibshirani Query:
• Really need that complicated stuff?
(DWD is complex)
• Can’t we just use means?
• Empirical Fact (Joel Parker):
(DWD better than simple methods)
DWD: Why does it work?
Xuxin Liu Observation:
• Key is unbalanced sub-sample sizes
(e.g biological subtypes)
• Mean methods strongly affected
• DWD much more robust
• Toy Example
DWD: Why does it work?
Xuxin Liu Example
• Goals:
– Bring colors together
– Keep symbols distinct (interesting biology)
• Study varying sub-sample proportions:
–
–
–
–
Ratio = 1: Both methods great
Ratio = 0.61: Mean degrades, DWD good
Ratio = 0.35: Mean poor, DWD still OK
Ratio = 0.11: DWD degraded, still better
• Later: will find underlying theory
PCA: Rediscovery – Renaming
Statistics:
Principal Component Analysis (PCA)
Social Sciences:
Factor Analysis (PCA is a subset)
Probability / Electrical Eng:
Karhunen – Loeve expansion
Applied Mathematics:
Proper Orthogonal Decomposition (POD)
Geo-Sciences:
Empirical Orthogonal Functions (EOF)
An Interesting Historical Note
The 1st (?) application of PCA to Functional
Data Analysis:
Rao, C. R. (1958) Some statistical methods
for comparison of growth curves,
Biometrics, 14, 1-17.
1st Paper with “Curves as Data” viewpoint
Detailed Look at PCA
Three important (and interesting) viewpoints:
1.
Mathematics
2.
Numerics
3.
Statistics
1st: Review linear alg. and multivar. prob.
Review of Linear Algebra
Vector Space:
x,
•
set of “vectors”,
•
and “scalars” (coefficients),
•
“closed” under “linear combination”
(


 x1 
e.g. d   

   x     : x1 ,..., xd  
x 

,


d


a
a x
i
i
i
“ d dim Euclid’n space”
in space)
Review of Linear Algebra (Cont.)
Subspace:
• subset that is again a vector space
• i.e. closed under linear combination
• e.g. lines through the origin
• e.g. planes through the origin
• e.g. subsp. “generated by” a set of vector
(all linear combos of them =
= containing hyperplane
through origin)
Review of Linear Algebra (Cont.)
Basis of subspace: set of vectors that:
•
span, i.e. everything is a lin. com. of them
•
are linearly indep’t, i.e. lin. Com. is unique

•
e.g.
•
since
d
 1   0   0 
     

“unit vector basis”  0   1    
 ,  ,...,  
       0 
 0   0   1 
 x1 
1
 0
 0
 
 
 
 
0
1

 x2 



    x1     x 2       x d  0 
 
 
 
 
0
 0
1
 xd 
Review of Linear Algebra (Cont.)
Basis Matrix, of subspace of
Given a basis,

d
v1 ,..., vn ,
create matrix of columns:

B  v1
v1n 
 v11


 vn      
v

vdn  d n
 d1

Review of Linear Algebra (Cont.)
Then “linear combo” is a matrix multiplicat’n:
n
a v
i 1
i i
 Ba
Check sizes:
where
 a1 
 
a  
a 
 n
d 1  (d  n)  (n 1)
Review of Linear Algebra (Cont.)
Aside on matrix multiplication: (linear transformat’n)
For matrices
 a1,1  a1, m 
 b1,1  b1, n 




A    
B    
a
 ,
b


a

b
k ,m 
m, n 
 k ,1
 m,1
Define the “matrix product”
 m
  a1,i bi ,1 
 i 1
AB  


 m
  a k ,i bi ,1 
 i 1

a1,i bi , n 

i 1



m

a
b

k ,i i , n 
i 1

m
(“inner products” of columns with rows)
(composition of linear transformations)
Often useful to check sizes: k  n  k  m  m  n
Review of Linear Algebra (Cont.)
Matrix trace:
• For a square matrix
m
• Define tr ( A)   ai ,i
 a1,1  a1, m 


A  
 
a


a
m,m 
 m,1
i 1
• Trace commutes with matrix multiplication:
tr  AB   tr  BA
Review of Linear Algebra (Cont.)
Dimension of subspace (a notion of “size”):
•
number of elements in a basis (unique)
•
dim d   d
•
e.g.
dim of a line is 1
•
e.g.
dim of a plane is 2
•
dimension is “degrees of freedom”
(use basis above)
Review of Linear Algebra (Cont.)
Norm of a vector:
• in

d,
1/ 2

2
x    x j 
 j 1 
d
 
 x x
t
1/ 2
• Idea: “length” of the vector
• Note:
 strange properties for high d ,
e.g. “length of diagonal of unit cube” = d
Review of Linear Algebra (Cont.)
Norm of a vector (cont.):
• “length normalized vector”:
x
x
(has length one, thus on surf. of unit sphere
& is a direction vector)
• get “distance” as:
d x , y   x  y 
x  y  x  y 
t
Review of Linear Algebra (Cont.)
Inner (dot, scalar) product:
d
x, y   x j y j  x y
t
j 1
• for vectors
x
and
y,
• related to norm, via
x  x, x  x x
t
Review of Linear Algebra (Cont.)
Inner (dot, scalar) product (cont.):
• measures “angle between
 x, y
1 
anglex, y   cos
 x y

x and y ” as:
t



x
y

  cos 1 

 xt x  yt y 



• key to “orthogonality”, i.e. “perpendicul’ty”:
x y
if and only if
x, y  0
Review of Linear Algebra (Cont.)
Orthonormal basis
v1 ,..., vn :
• All ortho to each other,
i.e. vi , vi '  0 ,
for
i  i'
• All have length 1,
i.e. vi , vi  1,
for i  1,..., n
Review of Linear Algebra (Cont.)
Orthonormal basis
v1 ,..., vn
(cont.):
n
x   a i vi
• “Spectral Representation”:
ai  x, vi
where
check: x, vi 
i 1
n
a v ,v
i '1
i' i'
n
i
  a i ' vi ' , vi  a i
i '1
• Matrix notation: x  B a where a t  x t B i.e. a  B t x
a is called “transform (e.g. Fourier, wavelet) of x ”
Review of Linear Algebra (Cont.)
Parseval identity, for x
in subsp. gen’d by o. n. basis v1 ,..., vn :
n
x   x, vi
2
i 1
2
n
 a  a
i 1
2
i
2
• Pythagorean theorem
• “Decomposition of Energy”
• ANOVA - sums of squares
• Transform, a , has same length as x ,
i.e. “rotation in  d ”
Review of Linear Algebra (Cont.)
Gram-Schmidt Ortho-normalization
Idea: Given a basis
v1 ,..., vn,
find an orthonormal version,
by subtracting non-ortho part
u1  v1 / v1

 v

u 2  v 2  v 2 , u1 u1 / v 2  v 2 , u1 u1
u3
3

 v 3 , u 1 u1  v 3 , u 1 u1 / v 3  v 3 , u1 u1  v 3 , u1 u1
Review of Linear Algebra (Cont.)
Projection of a vector
x onto a subspace V :
• Idea: member of V that is closest to x
(i.e. “approx’n”)
• Find PV x  V that solves:
min x  v
vV
(“least squares”)
• For inner product (Hilbert) space:
PV x exists and is unique
Review of Linear Algebra (Cont.)
Projection of a vector onto a subspace (cont.):
• General solution in  : for basis matrix BV ,
d
PV x  BV B BV  B x
1
t
V
t
V
• So “proj’n operator” is “matrix mult’n”:

PV  BV B BV
t
V

1
BVt
(thus projection is another linear operation)
(note same operation underlies least squares)
Review of Linear Algebra (Cont.)
Projection using orthonormal basis v1 ,..., vn :
• Basis matrix is “orthonormal”:
 v ,v
 v1t 
 1 1
 
   v1  vn   

 t
 vn 
 vn , v1
 


• So




BVt BV  I nn
v1 , vn   1  0 




  
 
vn , vn   0  1 

PV x  BV BVt x  =
= Recon(Coeffs of x “in V dir’n”)
Review of Linear Algebra (Cont.)
Projection using orthonormal basis (cont.):

V
• For “orthogonal complement”,
,
x  PV x  PV  x
x  PV x  PV  x
2
and
2
• Parseval inequality:
n
PV x  x   x, vi
2
2
i 1
2
n
  ai2  a
i 1
2
2
Review of Linear Algebra (Cont.)
(Real) Unitary Matrices: U d d with
U tU  I
• Orthonormal basis matrix
(so all of above applies)
• Follows that
UU  I
t
1
U
(since have full rank, so
exists …)
• Lin. trans. (mult. by U ) is like “rotation” of  d
• But also includes “mirror images”
Review of Linear Algebra (Cont.)
Singular Value Decomposition (SVD):
For a matrix X d n
Find a diagonal matrix S d n,
with entries
s1 ,..., smin( d , n )
called singular values
And unitary (rotation) matrices U d d , Vnn
(recall U tU  V tV  I)
so that
X  USV
t
Review of Linear Algebra (Cont.)
Intuition behind Singular Value Decomposition:
• For X a “linear transf’n” (via matrix multi’n)
X  v  U  S  V t  v  U  S  V t  v 
• First rotate
• Second rescale coordinate axes (by
si )
• Third rotate again
• i.e. have diagonalized the transformation
Review of Linear Algebra (Cont.)
SVD Compact Representation:
Useful Labeling:
s1    smin( n ,d )
Singular Values in Increasing Order
Note: singular values = 0 can be omitted
Let
r = # of positive singular values
Then:
Where
X  U d r SrrVnr
t
are truncations of U , S , V
Review of Linear Algebra (Cont.)
Eigenvalue Decomposition:
For a (symmetric) square matrix X d d
 1  0 


Find a diagonal matrix D      
0   

d 
And an orthonormal matrix Bd d
(i.e. B t  B  B  B t  I d d )
So that: X  B  B  D,
i.e. X  B  D  B t
Review of Linear Algebra (Cont.)
Eigenvalue Decomposition (cont.):
• Relation to Singular Value Decomposition
(looks similar?):
• Eigenvalue decomposition “harder”
U V
• Since needs
• Price is eigenvalue decomp’n is generally
complex
• Except for X square and symmetric
• Then eigenvalue decomp. is real valued
• Thus is the sing’r value decomp. with:
U V  B
Related documents