Download Dia 1

Independent Component Analysis NiC fMRI-methodology Journal Club 05 Dec. 2008 Overview • Theory – PCA – ICA • Application – Data reduction – Spatial vs. Temporal ICA – Group-ICA There will be math … Non-fMRI example • The cocktail-party problem – Many people are speaking in a room – Given a number of microphone recordings from different locations, can the various sources be reconstructed? Non-fMRI example • Signals are collected in row-vectors Sources – r1 = 0.8∙s1+0.6∙s2 – r2 = 0.2∙s1+0.5∙s2  r1  0.8 0.6  s1  R  A S         r  2  0.2 0.5 s2  Recordings r2 s2 Non-fMRI example s1 r1 Non-fMRI example • • • Assumptions – Recordings consist of linearly transformed sources (‘weighted averages’) – Original sources are completely unrelated • Uncorrelated • Independent Then – Sources can be estimated by linearly (back-)transforming the recordings – The optimal transformation makes the source estimates ‘as unrelated as possible’ • Minimize correlations • Maximize independence In fMRI – Sources of interest are the various ‘brain systems’ – Recordings are the acquisition volumes and/or voxel time courses Non-fMRI example S1 R1 S2 R2 Theory PCA ICA PCA • Recordings typically are not uncorrelated R r1  r2T  r  r r  r  1 T 1 2 T 2  0  r1  r2T  0 • Goal: find a transformation matrix V such that transformed signals Y=V∙R are uncorrelated y1  y T2   v1  R    v 2  R   v1  R  R T  v T2  0 T • If the transformed signals are properly normalized (|yi|=1) Y  Y  V  R  R  V  I  R  R  V  V T T T T T 1 PCA • Q: how do we find V such that V  V  R  R T  T 1 (note that R∙RT equals the covariance matrix) • A: use the singular value decomposition (SVD) R  U L  Λ  U TR such that V T  V  U L  Λ 2  U TL resulting in a solution V  Λ 1  U TL PCA • In pictures, PCA performes ‘sphering’ – Fit the data cloud with an ellipsoidal function – Rotate the ellipsoid such that its axes coincide with the coordinate axes – Squeeze/stretch the ellipsoid to make it spherical Rotation Squeezing/stretching y2 r2 x2 Recordings r1 x1 y1 R X  U TL  R Y  Λ 1  U TL  R PCA • By means of a linear transformation of the recordings r1 and r2, we found signals y1 and y2 that are normalized and completely uncorrelated • Yet, these signals do not at all and s2, and are obviously still mixtures • y2 correspond with the original sources s1 However, any rotation will preserve uncorrelatedness • So, how the determine the ‘best’ rotation? y1 ICA • Statistical independence – Knowledge of the value of y1 for a sample does not affect the statistical distribution of the values of y2 P  y2 | y1   P  y2  – Equivalently, the joint probability distribution is the product of the marginal probability ditributions P  y1 , y2   P  y1   P  y2  – As a result, loosely speaking, all non-gaussian ‘features’ of the probability distribution are either caused by P(y1) or P(y2), but not by both, and therefore lie parallel with the coordinate axes ICA • Independence implies uncorrelatedness (But the reverse is not true!) • Therefore, maximally independent signals are also likely approximately uncorrelated • This suggests performing PCA first, to decorrelate the signals, and determining a suitable rotation next that optimizes independence but (automatically) preserves uncorrelatedness ICA • A simple idea: Penalize all data points with a penalty P=(y12∙y22) (i.e., for points on any axis) y2 – No penalty if y1=0 or y2=0 – Positive penalty for points in any of the four quadrants – Increasing penalty for points that are further away from the axes y1 ICA • Minimize the penalty over a search space that consists of • The solution is determined up z2 all rotations to – Component order – Component magnitude – Component sign z1 ICA • Average penalty P P 1 N  y 2 1, j  y2,2 j  j • Rotations preserve the euclidean distance to the origin 1 N  y 2 1, j y 2 2, j  2 C j • It follows that minimizing P is equivalent to maximizing K K  C  2P  1 N  y 4 1, j j  y2,4 j  • K is closely related to kurtosis (‘peakiness’) ICA • Central limit theorem – A mixture of a sufficiently large number of independent random variables (each with finite mean and variance) will be approximately normally distributed • This suggests that the original sources are more non-gaussian than the mixtures – Maximum non-gaussianity is the criterion to use to determine the optimal rotation! – This cannot be successful if sources are normally distributed – QuartiMax employs kurtosis as a measure of non-gaussianity ICA • If some sources are platykurtotic, the method may fail • Therefore, maximize K   E  yij4   E  ij4  2 ICA • Due to the fourth power, the method is sensitive to outliers • Therefore, choose any other function G     K   E G  yij   E G  ij     – G(y) = y4 – G(y) = log(cosh(y)) – G(y) = 1-exp(-y2) • FastICA [Hyvärinen] 2 ICA • Some non-gaussian distributions will happen to have K=0 • Therefore, use mutual information expressed in terms of negentropy   K   E  log P  yij     E  log  P   • InfoMax [Bell & Sejnovski] ij 2 ICA • Entropy-calculations require approximation of the probability density function P itself • Therefore, expand negentropy in terms of lower-order cumulants (generalized variance/skewness/kurtosis/…) K   E Qijkl  y   E Qijkl   • JADE [Cardoso] 2 ICA MatLab code % Generate signals S = wavread('D:\sources.wav')'; A = [0.8,0.6;0.2,0.5]; R = A*S; clear S,A; % PCA [UL,LAMBDA,UR] = svd(R,'econ'); V = inv(LAMBDA)*UL'; Y = V*R; % ICA Z = rotatefactors(Y','Method','quartimax')'; Z = icatb_fastICA(Y,'approach','symm','g','tanh'); [d1,d2,Z] = icatb_runica(Y); Z = icatb_jade_opac(Y); ICA • In practice, all mentioned algorithms have been QuartiMax InfoMax FastICA JADE [?] reported to perform satisfactorily – QuartiMax – FastICA – InfoMax – JADE ICA • In practice, all mentioned algorithms have been QuartiMax InfoMax FastICA JADE reported to perform satisfactorily – QuartiMax – FastICA – InfoMax – JADE Application Data reduction Spatial vs. Temporal ICA Group-ICA Data reduction • fMRI-data are gathered in a matrix Y Y Data reduction • fMRI-data are gathered in a matrix Y Data are decomposed into time comp principal components by means Ux ∙ Λ time comp = comp Y voxels of SVD comp voxels • ∙ Ut Data reduction • fMRI-data are gathered in a matrix Y Data are decomposed into time comp principal components by means components are retained = Ux comp Only the first few strongest Y voxels • voxels of SVD comp ∙ Λ time comp • ∙ Ut Data reduction • fMRI-data are gathered in a matrix Y • of SVD λ2 × × λ3 × × Only the first few strongest components are retained • × Data are decomposed into principal components by means • λ1 × Each component is the product of – a coefficient – a spatial map – a time course … residuals + Y Temporal vs. Spatial ICA Temporal ICA – Uncorrelated spatial maps and – • Acquisitions PCA decomposition results in Uncorrelated time courses Voxel j • Component map ICA rotation results in – Maximally independent time courses: Voxel i tICA Spatial ICA or sICA • Voxels Maximally independent spatial maps: Some methods employ criteria in both domains Acquisition j – Component time course Acquisition i Temporal vs. Spatial ICA • PCA Y = Ux ∙ Λ ∙ U St Ux ∙ Λ ∙ A∙ St = Vx ∙ St Sx ∙ A ∙ Λ∙ U St = Sx ∙ V St • tICA Y = • sICA Y = Temporal vs. Spatial ICA • Temporal ICA – Components have independent temporal dynamics: “Strength of one component at a particular moment in time does not provide information on the strength of other components at that moment” – Components may be correlated/dependent in space – Popular for cocktail party problem • Spatial ICA – Components have independent spatial distributions: “Strength of one component in particular voxel does not provide information on the strength of other components in that voxel” – Components may be correlated/dependent in time – Popular for fMRI Temporal vs. Spatial ICA Calhoun et al., 2001 Group-ICA • Some problems at the subject level become even more pressing at the group level – Independent components have no natural interpretation – Independent components have no meaningful order – The magnitude of independent component maps and time courses is undetermined – The sign of independent component maps and time courses is arbitrary • For group level analyses, some form of ‘matching’ is required – Assumptions • Equal coefficients across subjects? • Equal distributions across subjects? • Equal dynamics across subjects? λi × × Group-ICA • Method I: Averaging – E.g.: Schmithorst et al., 2004 • Principle – Average the data sets of all subjects before ICA – Perform ICA on the mean data • Key points – All subjects are assumed to have identical components • Equal coefficients > homogeneous population • Equal distributions > comparable brain organization • Equal dynamics > fixed paradigm; resting state impossible – Statistical assessment at group level • Enter ICA time courses into linear regression model (back-projection) Group-ICA • Method II: Tensor-ICA – • E.g.: Beckmann et al., 2005 Principle – Stack subjects’ data matrices along a third dimension – Decompose data tensor into product of (acquisition-dependent) time course, (voxeldependent) maps, and (subject-dependent) loadings • Key points – – Components may differ only in strength • Unequal coefficients > inhomogeneous population • Equal distributions > comparable brain organization • Equal dynamics > fixed paradigm; resting state impossible Statistical assessment at group level • Components as a whole Group-ICA • Method III: Spatial concatenation – • • E.g.: Svensén et al., 2002 Principle – Concatenate subjects’ data matrices along the spatial dimension – Perform ICA on aggregate data matrix – Partition resulting components into individual maps Key points – – Components may differ in strength and distribution • Unequal coefficients > inhomogeneous population • Unequal distributions > brain plasticity • Equal dynamics > fixed paradigm; resting state impossible Statistical assessment at group level • Voxel-by-voxel SPMs Group-ICA • Method IV: Temporal concatenation – • • E.g.: Calhoun et al., 2001 Principle – Concatenate subjects’ data matrices along the time dimension – Perform ICA on aggregate data matrix – Partition resulting components into individual time courses Key points – – Components may differ in strength and dynamics • Unequal coefficients > inhomogeneous population • Equal distributions > comparable brain organization • Unequal dynamics > flexible paradigm Statistical assessment at group level • Components as a whole, from time course spectrum/power • Voxel-by-voxel SPMs, from back-projection (careful with statistics!) Group-ICA • • • Method V: Retrospective matching – Subjective or (semi)automatic matching on basis of similarity between distribution maps and/or time courses – Also used to test the reproducibility of some stochastic ICA algorithms – Various principles, various authors Principle – Perform ICA on individual subjects – Match similar individual components one-on-one across subjects Key points – – Components may differ in strength and dynamics • Unequal coefficients > inhomogeneous population • Unequal distributions > brain plasticity • Unequal dynamics > flexible paradigm Statistical assessment at group level • Voxel-by-voxel SPMs (careful with scaling and bias!) Conclusion • ICA has the major advantage that it requires minimal assumptions or prior knowledge • However, interpretation of the meaning of components occurs retrospectively and may be ambiguous • Unfortunately, methods and statistics are not fully characterized yet and still quite heavily under development • Therefore - IMHO – independent component analysis is an excellent tool for exploratory experiments, but should not be your first choice for confirmatory studies

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Dia 1