Download Text(Lec10_txt file

2001/02: Lecture 10. Statistical Methods for Data Analysis     Simple statistics (mean value, variance, histogram, covariance, correlation) Regression analysis PCA, ICA Fourier series, wavelets Simple statistics Let x1 , x2 ,..., xn is a sample. From a natural science point of view, the sample is a result of repetitive and independent measures of some subject. From a mathematical point of view, the sample is a result of n independent repetitions of a random experiment with a random variable  , which has the distribution dF ( x ) function F ( x ) ( F ( x )  probabilit y{  x}) or the density function f ( x )  . dx  The mean of random variable: M   xf ( x)dx   The variance of random variable: D   ( x  M ) f ( x )dx  2   x 2 f ( x )dx  M 2  Examples of random variables: Binomial random variable (discrete). Let us toss a coin (m times). Let suppose that the probability to get heads (1) is p and the probability to get the tails (0) is q = 1-p. The binomial random variable is the number of heads (one’s) among m results of tossing coin. Pr{1}  p; Pr{0}  q  1  p; Pr{  k}  Cmk p k q mk M  mp, D  mpq . Uniform random variable (rectangular distribution) in the interval [0,1]: 1 if 0  x  1 f ( x)   0 otherwise M  1 / 2, D  1 / 12 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 Gaussian (normal) random variable: ( x  M )2  1 2 f ( x)  e 2  2 Mean is M, variance is  . 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -3 -2 -1 0 1 2 3 Let x1 , x2 ,..., xn is a sample. ( x max  x min ) 2 n 1 Another sample mean: x   xi n i 1 1 n ( xi  x ) 2 Sample variance: s 2   n  1 i 1 Estimator of the mean: q  Sample standard deviation: s  1 n  ( xi  x )2 n  1 i 1 Histogram: ( x max  x min ) , k counteri  counteri  1 if x min  (i  1) * bin  x j  x min  i * bin , j  1,2, ..., n, x min , x max , bin  i  1,2,..., k , k  n / 20 This counteri shows the number of sample elements inside of the i-th bin. The histogram is used to estimate the density function of the random variable  . Let suppose that the sample consists of pairs ( xi , yi ), i  1,2,..., n . Bivariate Normal Distribution f ( x, y )  1 2 x y   ( x  M x )2 ( x  M x )( y  M y ) ( y  M y ) 2   1    exp   2  2  2 2  2  2 ( 1   )     1  x x y y    2 n Sample covariance is cov  (x i 1 i  x )( yi  y ) n 1 Note that in case of independence between X and Y, the covariance is zero. If covariance between X and Y is zero then, generally speaking, we do not know if X and Y are independent. If X and Y are normal random variables and covariance is zero then X and Y are independent. n Sample correlation:   cov s x2 s 2y  (x i 1 i  x )( yi  y ) (n  1) s x2 s 2y , 1    1 Regression analysis (multiple regression) Independent variables (predictors) X 1 , X 2 ,..., X p Dependent variable (response) Y Regression model: Yi  1   2 X 2i   3 X 3i  ...   p X pi   i , i  1,2,...n, n  p 1 is the intercept ,  2 ,... p are the regression slope coefficien ts  i is the residual term, normal random variable. M ( i )  0, cov( i j )  0 Y  Xβ  ε, X 1  1 Least square method: ε' ε  ( Y  Xβ)' ( Y  Xβ) ( XX' )β  X' Y β  (X' X) 1 X' Y Multiple coefficient of determination:  (Yi  Yi m )2 , 0  R 2  1 R2  1   (Yi  Y )2 The numerator gives the error sums-of squares and the denominator gives the total variation. If R 2 is close to 0 then it means that the regression model and a simple mean value model are very similar. If R 2 is close to 1 then it means that the fitting by the regression model is good and the error is small. Principle Component Analysis (PCA) http://www.cis.hut.fi/projects/ica/fastica/ Independent component analysis http://www.cis.hut.fi/projects/ica/ 3 Fourier Series and Wavelets http://www.amara.com/current/wavelet.html (Four iterations of a Daubechies wavelet) 4

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Text(Lec10_txt file