Download Most popular distributions

Document related concepts

Probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
STATISTICAL DATA
ANALYSIS
Prof. Janusz Gajda
Dept. of Instrumentation and
Measurement
Plan of the lecture
• Classification of the measuring signals according
to their statistical properties.
• Definition of basic parameters and characteristics
of the stochastic signals (expected value, variance,
mean-square value, probability density function,
power spectral density, joined probability density
function, cross-correlation function, spectral
density function, transfer function).
•Interpretation of the basic statistical characteristics.
•Elements of statistic: estimation theory and hypothesis
verification theory, parametric and non-parametric
estimation, point and interval estimation.
Good properties of the estimator:
-unbiased,
-effective,
-zgodny,
-robust.
• Application of the point estimation and interval
estimation methods in determination of estimates
of these parameters and characteristics.
• Estimators of basic parameters and characteristics
of random signals: mean value and variance,
probability density function, autocorrelation and
cross correlation function, power spectral density and
mutual spectral density, coherency function, transmittance.
• Analysis of the statistical properties of those estimators.
• Determination of the confidence intervals of basic
statistical parameters for assumed confidence level.
• Statistical hypothesis and their verification.
• Errors of the first and second order observed during
the verification process.
Classification of the measuring signals
according to their statistical properties
Deterministic signals
Periodic
signals
Mono-harmonic
signals
Poli-harmonic
signals
Non-periodic
signals
Almost periodic
signals
Transient
signals
Periodic signals
Mono-harmonic signals:
x t =A  sin 0  t +
Where:
A - signal amplitude,
 0  2  f 0 - angular frequency,

- initial phase angle,
Poly-harmonic signals:
xt = An  sin n  0  t + n 
n
Where:
An
- amplitude of the n-th harmonic component,
0  2  f0
n
- basic angular frequency,
- initial phase angle of the n-th component,
Frequency spectrum of the periodic signals

x t = X n  sin2 f n  t+ n 
n=1
f n  f n  k   k
- is the measurable number
Classification of the measuring signals
according to their statistical properties
Stochastic signals
Stationary
signals
Ergodic
signals
Non-ergodic
signals
Non-stationary
signals
Different classes
of non-stationary
signals
realizacja 1
reralizacja 2
realizacja 3
realizacja 4
realizacja 5
Set of realizations of the
random quantity
0.40
0.20
0.00
x1(t2)
-0.20
-0.40
x1(t1)
0.40
x2(t2)
0.20
0.00
-0.20
-0.40
x1(t2)
0.40
x3(t2)
0.20
0.00
-0.20
-0.40
x3(t1)
0.40
x4(t2)
0.20
0.00
-0.20
-0.40
x4(t1)
0.40
0.20
0.00
x5(t2)
-0.20
-0.40
x5(t1)
czas
Basic statistical characteristics
Mean-square value:
T
1
2
2
2
x =E x  t    lim  x  t   dt
T  T
0
Root-mean-square value
x sk = 
2
x
Expected value
T
1
 x=E x t    lim  x t   dt
T  T
0
Variance:

 =E xt    x 
2
x
2

T
1
2
 lim  xt    x  dt
T  T
0
Probability function:
Tx
Pr x<x t   x+x = lim
T  T
Probability density function:
Prx<xt   x+x 
1 
Tx 
p( x)= lim
= lim
lim 

x  0
x  0 x T   T
x


Most popular distributions:
Standardised normal distribution:

2
x
1
2
e
px  
2
Most popular distributions:
Normal distribution:

x 

2
px  
1
2 
e
2
2
probability density p(x)
0.4
0.3
0.2
0.1
0
-20
-10
0
argument x
10
20
probability density p(x)
0.4
0.3
0.2
0.1
0
-8
-4
0
4
argument x
8
12
probability density p(x)
0.4
0.3
0.68
0.2
0.1
0
-6
-4
-2
0
2
4
6
argument x
probability density p(x)
0.4
0.3
0.95
0.2
0.1
0
-6
-4
-2
0
2
4
6
argument x
Normal distribution – cumulative probability
x
P  x    p   d

cumulative probability P(x)
1
0.8
0.6
0.4
0.2
0
-5
-4
-3
-2
-1
0
1
2
3
4
5
argument x
Normal distribution - kwantyle


xp
 
Pr x  x p   p  d  P x p
3
2
1
xp
x p   P 

0
-1
-2
-3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
cumulative probability P(xp)
Most popular distributions:
2
Chi-square distribution (  )
   2    
p
2
n
 ; n
2
1
1 2
 


1
2

e 2 ;
- number of freedom degrees
1
n=1
probability density p( )
0.8
0.6
n=2
n=3
0.4
n=10
0.2
0
0
4
8
12
16
20
argument
0.2
2
0.16
probability density
1
0.12
0.08
0.04
0
-5
0
5
10
15
20
Chi-square distribution – cumulative distribution
n=2, 3, 4, ... 20
cumulative probability P( )
12
8
4
0
0
4
8
12
16
20
argument
Most popular distributions:
t- Student distribution – probability density:
1
1




f

1
  f  1 
2 2
t
2


1  
pt  


f
1 

 f   f 
2 
f – number of freedom degrees
0.4
f=10
probability density p(t)
f=2
0.3
f=1
0.2
0.1
0
-6
-4
-2
0
2
4
6
argument t
t - Student distribution – cumulative probability:
f = 2, 3, 4, ... 10
cumulative probability P(t)
1
0.8
0.6
0.4
0.2
0
-10
-8
-6
-4
-2
0
2
4
6
8
10
argument t
Auto-correlation function.
T
1
K x = lim  xt xt    dt
T  T
0
T
1
Rx = lim  xt    x xt     x  dt
T  T
0
0.6
auto-correlation Rx( )
0.4
0.2
0
-0.2
-0.4
-0.6
0
1
2
argument
3
[s]
4
auto-correlation Rx( )
1.2
0.8
0.4
0
-0.4
0
0.2
0.4
argument
0.6
[s]
0.8
1
1.8
auto-correlation Rx( )
1.5
1.2
0.9
0.6
0.3
0
-0.3
-0.6
0
0.4
0.8
argument
1.2
[s]
1.6
2
Power spectral density:
1
Gx  f   lim
f 0 f



1 2
x t , f , f  dt 
Tlim

 T
0


S x  f =  Rx   e
-
T
 j 2f 
d
xt   A1 sin 2 f1  2 Hz   1  
 A2 sin 2 f 2  6 Hz   2 
5
power spectral density G(f)
2
0.5 A2
4
3
0.5 A12
2
1
0
0
2
4
6
8
10
frequency f [Hz]
Joined density of probability function:
amplitude x(t)
6
4
2
0
-2
x+dx
-4
x
amplitude y(t)
-6
0
0.2
0.4
0
0.2
0.4
0.6
0.8
1
time [s]
0.6
0.8
1
time [s]
6
4
2
0
y+dy
y
-2
Txy
T
Txy 
1 
p x, y= lim
lim


x 0 x  y T  T


y 0
Joined cumulative probability:
x y
P x, y= Pr x t   x, y t   y=   p ,  d d
- -
0.15
0.1
0.05
0
5
0
5
0
-5
-5
5
0
5
0
-5
-5
Cross – correlation:
T
1
K xy = lim  xt   yt   d
T  T
0
T


1
Rxy = lim  xt    x   yt  - y dt
T  T
0
yt   1.  sin 21Hz t   1.3  randn0;1
6
signal amplitude
4
2
0
-2
-4
-6
0
1
2
time [s]
3
4
0.6
cross-correlation
0.4
0.2
0
-0.2
-0.4
-0.6
-2
-1
0
time [s]
1
2
Spectral density function:

S xy  jf =  Rxy   e
- j 2 f 
d
-
0.045
spectral density
0.04
0.035
^
Sxy(jf)
0.03
0.025
^
S
x(f)
0.02
0.015
0.01
0.005
0
0
4
8
12
frequency [Hz]
16
20
H xy  jf  
 j 2f 
2
S xy  jf   H xy  jf   S x  f 
k   n2
 2   n   j 2f    n2
-4
0
-3
-2
Real[Hxy(jf)]
-1 0 1 2
f=50 Hz
3
4
5
f=0 Hz
-2
Imag[Hxy(jf)]
Transfer function:
-4
-6
-8
Hxy(jf)
-10
6
Statistical Base of Data Analysis
Mathematical statistics deals with gaining information
from data. In practice, data often contain some
randomness or uncertainty. Statistics handles such data
using methods of probability theory.
Mathematical statistics tests the distribution of the
random quantities
In applying statistics to a scientific, industrial, or societal problem,
one begins with a process or population to be studied.
This might be a population of people in a country, of crystal
grains in a rock, or of goods manufactured by a particular factory
during a given period.
It may instead be a process observed at various times; data collected
about this kind of "population" constitute what is called a
time series.
For practical reasons, one usually studies a chosen subset of the
population, called a sample. Data are collected about the sample
in an observational or measurement experiment.
The data are then subjected to statistical analysis, which serves two
related purposes: description and inference.
•Descriptive statistics can be used to summarize the data,
either numerically or graphically, to describe the sample.
Basic examples of numerical descriptors include
the mean and standard deviation.
Graphical summarizations include various kinds of charts
and graphs.
•Inferential statistics is used to model patterns in the data,
accounting for randomness and drawing inferences about the
larger population. These inferences may take the form of answers
to yes/no questions (hypothesis testing), estimates of numerical
characteristics (estimation), descriptions of association (correlation)
or modelling of relationships (regression).
Mathematical statistic
hypothesis tests
estimation theory
parametric estimation
point estimation
non-parametric
estimation
interval estimation
hypothesis tests
A statistical hypothesis test, or more briefly, hypothesis test,
is an algorithm to state the alternative (for or against the hypothesis)
which minimizes certain risks.
The only conclusion, which may be draw-out from the test is that
•There is not enough evidence to reject the hypothesis.
•Hypothesis is false.
estimation theory
Estimation theory is a branch of statistics and that deals with
estimating the values of parameters based on measured/empirical
data. The parameters describe the physical object that answers
a question posed by the estimator.
non-parametric
estimation
Nonparametric estimation is a statistical method
that allows determination of the chosen characteristic,
understood as a set of points in predefined coordinates
system (without any functional description).
parametric estimation
Parametric estimation is a statistical method
that allows determination of the chosen parameters,
describing the analysed signal or object.
point estimation
In statistics, point estimation involves the use of sample data
to calculate a single value (known as a estimate) which is to serve
as a "best guess" for an unknown (fixed or random)
population parameter.
interval estimation
In statistics, interval estimation is the use of sample data
to calculate an interval of possible (or probable) values of
an unknown population parameter.
Statistical Base of Data Analysis
Random quantities:
A random event may appear or not as a result of
experiment.
A random variable is a function, which assigns unique
numerical values to all possible outcomes of a random
experiment under fixed conditions.
A random variable is not a variable but rather a function
that maps events to numbers.
A stochastic process, or sometimes random process, is the
opposite of a deterministic process. Instead of dealing only with
one possible 'reality' of how the process might evolve under time,
in a stochastic or random process there is some indeterminacy
in its future evolution described by probability distributions.
This means that even if the initial condition (or starting point)
is known, there are many possibilities the process might go to,
but some paths are more probable and others less
All elements belonging to the defined set are called the
general population. For instance: all citizens of the defined country.
A sample population chosen sub-set of general population.
general population
sample population
Estimator properties – ideal estimator.
as
variance
bi
er
ro
r
Unbiased estimators:
This means that the average of the estimates from an
increasing number of experiments should converge to the
true parameter values, assuming that the noise characteristics
are constant during the experiments.
A more precise mathematical description would be:
An estimator is called „unbiased” if its expected value
is equal to the true value.
 

E  
100.0
80.0
estimates
60.0
40.0
20.0
0.0
0
400
800
1200
number of samples
1600
Asymptotically unbiased estimator:
Same estimators are biased, but in general expected value
of an estimator should converge to the true value if the number
of measurements increases to infinity.
Again this can be formulated more carefully:
An estimator is called „asymptotically unbiased” if


ˆ N   
lim E 
N 
with N number of measurements.
3.50E-004
true value
3.00E-004
estimator
2.50E-004
2.00E-004
1.50E-004
1.00E-004
5.00E-005
0.00E+000
0
4
8
12
16
20
24
28
number of samples
32
36
40
44
Efficient estimators.
Estimator with smaller root-mean error is called more
efficient.



2
2


E ( k - )  E ( i - )
   

2
2


ˆ  = E 
ˆ  +
 =E 


2
2
2

ˆ
ˆ
E E 
 b +


2

 

Consistent estimator.
An estimator is called consistent if


     =0
lim Pr 
N 
for each
>0
Robust estimator
An estimator is called a robust estimator if its properties are
still valid when the assumptions made in its construction are
no longer applicable.