* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Text(Lec10_txt file
Survey
Document related concepts
Transcript
2001/02: Lecture 10. Statistical Methods for Data Analysis Simple statistics (mean value, variance, histogram, covariance, correlation) Regression analysis PCA, ICA Fourier series, wavelets Simple statistics Let x1 , x2 ,..., xn is a sample. From a natural science point of view, the sample is a result of repetitive and independent measures of some subject. From a mathematical point of view, the sample is a result of n independent repetitions of a random experiment with a random variable , which has the distribution dF ( x ) function F ( x ) ( F ( x ) probabilit y{ x}) or the density function f ( x ) . dx The mean of random variable: M xf ( x)dx The variance of random variable: D ( x M ) f ( x )dx 2 x 2 f ( x )dx M 2 Examples of random variables: Binomial random variable (discrete). Let us toss a coin (m times). Let suppose that the probability to get heads (1) is p and the probability to get the tails (0) is q = 1-p. The binomial random variable is the number of heads (one’s) among m results of tossing coin. Pr{1} p; Pr{0} q 1 p; Pr{ k} Cmk p k q mk M mp, D mpq . Uniform random variable (rectangular distribution) in the interval [0,1]: 1 if 0 x 1 f ( x) 0 otherwise M 1 / 2, D 1 / 12 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 Gaussian (normal) random variable: ( x M )2 1 2 f ( x) e 2 2 Mean is M, variance is . 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -3 -2 -1 0 1 2 3 Let x1 , x2 ,..., xn is a sample. ( x max x min ) 2 n 1 Another sample mean: x xi n i 1 1 n ( xi x ) 2 Sample variance: s 2 n 1 i 1 Estimator of the mean: q Sample standard deviation: s 1 n ( xi x )2 n 1 i 1 Histogram: ( x max x min ) , k counteri counteri 1 if x min (i 1) * bin x j x min i * bin , j 1,2, ..., n, x min , x max , bin i 1,2,..., k , k n / 20 This counteri shows the number of sample elements inside of the i-th bin. The histogram is used to estimate the density function of the random variable . Let suppose that the sample consists of pairs ( xi , yi ), i 1,2,..., n . Bivariate Normal Distribution f ( x, y ) 1 2 x y ( x M x )2 ( x M x )( y M y ) ( y M y ) 2 1 exp 2 2 2 2 2 2 ( 1 ) 1 x x y y 2 n Sample covariance is cov (x i 1 i x )( yi y ) n 1 Note that in case of independence between X and Y, the covariance is zero. If covariance between X and Y is zero then, generally speaking, we do not know if X and Y are independent. If X and Y are normal random variables and covariance is zero then X and Y are independent. n Sample correlation: cov s x2 s 2y (x i 1 i x )( yi y ) (n 1) s x2 s 2y , 1 1 Regression analysis (multiple regression) Independent variables (predictors) X 1 , X 2 ,..., X p Dependent variable (response) Y Regression model: Yi 1 2 X 2i 3 X 3i ... p X pi i , i 1,2,...n, n p 1 is the intercept , 2 ,... p are the regression slope coefficien ts i is the residual term, normal random variable. M ( i ) 0, cov( i j ) 0 Y Xβ ε, X 1 1 Least square method: ε' ε ( Y Xβ)' ( Y Xβ) ( XX' )β X' Y β (X' X) 1 X' Y Multiple coefficient of determination: (Yi Yi m )2 , 0 R 2 1 R2 1 (Yi Y )2 The numerator gives the error sums-of squares and the denominator gives the total variation. If R 2 is close to 0 then it means that the regression model and a simple mean value model are very similar. If R 2 is close to 1 then it means that the fitting by the regression model is good and the error is small. Principle Component Analysis (PCA) http://www.cis.hut.fi/projects/ica/fastica/ Independent component analysis http://www.cis.hut.fi/projects/ica/ 3 Fourier Series and Wavelets http://www.amara.com/current/wavelet.html (Four iterations of a Daubechies wavelet) 4