* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Text(Lec10_txt file
Survey
Document related concepts
Transcript
2001/02: Lecture 10. Statistical Methods for Data Analysis
Simple statistics (mean value, variance, histogram, covariance, correlation)
Regression analysis
PCA, ICA
Fourier series, wavelets
Simple statistics
Let x1 , x2 ,..., xn is a sample. From a natural science point of view, the sample is a
result of repetitive and independent measures of some subject.
From a mathematical point of view, the sample is a result of n independent repetitions
of a random experiment with a random variable , which has the distribution
dF ( x )
function F ( x ) ( F ( x ) probabilit y{ x}) or the density function f ( x )
.
dx
The mean of random variable: M
xf ( x)dx
The variance of random variable: D ( x M ) f ( x )dx
2
x
2
f ( x )dx M 2
Examples of random variables:
Binomial random variable (discrete). Let us toss a coin (m times). Let suppose that
the probability to get heads (1) is p and the probability to get the tails (0) is q = 1-p.
The binomial random variable is the number of heads (one’s) among m results of
tossing coin.
Pr{1} p; Pr{0} q 1 p;
Pr{ k} Cmk p k q mk
M mp, D mpq .
Uniform random variable (rectangular distribution) in the interval [0,1]:
1 if 0 x 1
f ( x)
0 otherwise
M 1 / 2, D 1 / 12
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1
Gaussian (normal) random variable:
( x M )2
1
2
f ( x)
e 2
2
Mean is M, variance is .
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3
-2
-1
0
1
2
3
Let x1 , x2 ,..., xn is a sample.
( x max x min )
2
n
1
Another sample mean: x xi
n i 1
1 n
( xi x ) 2
Sample variance: s 2
n 1 i 1
Estimator of the mean: q
Sample standard deviation: s
1 n
( xi x )2
n 1 i 1
Histogram:
( x max x min )
,
k
counteri counteri 1 if x min (i 1) * bin x j x min i * bin , j 1,2, ..., n,
x min , x max , bin
i 1,2,..., k , k n / 20
This counteri shows the number of sample elements inside of the i-th bin.
The histogram is used to estimate the density function of the random variable .
Let suppose that the sample consists of pairs ( xi , yi ), i 1,2,..., n .
Bivariate Normal Distribution
f ( x, y )
1
2 x y
( x M x )2
( x M x )( y M y ) ( y M y ) 2
1
exp
2
2
2
2
2
2
(
1
)
1
x
x y
y
2
n
Sample covariance is cov
(x
i 1
i
x )( yi y )
n 1
Note that in case of independence between X and Y, the covariance is zero.
If covariance between X and Y is zero then, generally speaking, we do not know if X
and Y are independent.
If X and Y are normal random variables and covariance is zero then X and Y are
independent.
n
Sample correlation:
cov
s x2 s 2y
(x
i 1
i
x )( yi y )
(n 1) s x2 s 2y
, 1 1
Regression analysis (multiple regression)
Independent variables (predictors) X 1 , X 2 ,..., X p
Dependent variable (response) Y
Regression model:
Yi 1 2 X 2i 3 X 3i ... p X pi i , i 1,2,...n, n p
1 is the intercept , 2 ,... p are the regression slope coefficien ts
i is the residual term, normal random variable. M ( i ) 0, cov( i j ) 0
Y Xβ ε, X 1 1
Least square method:
ε' ε ( Y Xβ)' ( Y Xβ)
( XX' )β X' Y
β (X' X) 1 X' Y
Multiple coefficient of determination:
(Yi Yi m )2 , 0 R 2 1
R2 1
(Yi Y )2
The numerator gives the error sums-of squares and the denominator gives the total
variation. If R 2 is close to 0 then it means that the regression model and a simple
mean value model are very similar. If R 2 is close to 1 then it means that the fitting by
the regression model is good and the error is small.
Principle Component Analysis (PCA)
http://www.cis.hut.fi/projects/ica/fastica/
Independent component analysis
http://www.cis.hut.fi/projects/ica/
3
Fourier Series and Wavelets
http://www.amara.com/current/wavelet.html
(Four iterations of a Daubechies wavelet)
4