Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STATISTICAL DATA ANALYSIS Prof. Janusz Gajda Dept. of Instrumentation and Measurement Plan of the lecture • Classification of the measuring signals according to their statistical properties. • Definition of basic parameters and characteristics of the stochastic signals (expected value, variance, mean-square value, probability density function, power spectral density, joined probability density function, cross-correlation function, spectral density function, transfer function). •Interpretation of the basic statistical characteristics. •Elements of statistic: estimation theory and hypothesis verification theory, parametric and non-parametric estimation, point and interval estimation. Good properties of the estimator: -unbiased, -effective, -zgodny, -robust. • Application of the point estimation and interval estimation methods in determination of estimates of these parameters and characteristics. • Estimators of basic parameters and characteristics of random signals: mean value and variance, probability density function, autocorrelation and cross correlation function, power spectral density and mutual spectral density, coherency function, transmittance. • Analysis of the statistical properties of those estimators. • Determination of the confidence intervals of basic statistical parameters for assumed confidence level. • Statistical hypothesis and their verification. • Errors of the first and second order observed during the verification process. Classification of the measuring signals according to their statistical properties Deterministic signals Periodic signals Mono-harmonic signals Poli-harmonic signals Non-periodic signals Almost periodic signals Transient signals Periodic signals Mono-harmonic signals: x t =A sin 0 t + Where: A - signal amplitude, 0 2 f 0 - angular frequency, - initial phase angle, Poly-harmonic signals: xt = An sin n 0 t + n n Where: An - amplitude of the n-th harmonic component, 0 2 f0 n - basic angular frequency, - initial phase angle of the n-th component, Frequency spectrum of the periodic signals x t = X n sin2 f n t+ n n=1 f n f n k k - is the measurable number Classification of the measuring signals according to their statistical properties Stochastic signals Stationary signals Ergodic signals Non-ergodic signals Non-stationary signals Different classes of non-stationary signals realizacja 1 reralizacja 2 realizacja 3 realizacja 4 realizacja 5 Set of realizations of the random quantity 0.40 0.20 0.00 x1(t2) -0.20 -0.40 x1(t1) 0.40 x2(t2) 0.20 0.00 -0.20 -0.40 x1(t2) 0.40 x3(t2) 0.20 0.00 -0.20 -0.40 x3(t1) 0.40 x4(t2) 0.20 0.00 -0.20 -0.40 x4(t1) 0.40 0.20 0.00 x5(t2) -0.20 -0.40 x5(t1) czas Basic statistical characteristics Mean-square value: T 1 2 2 2 x =E x t lim x t dt T T 0 Root-mean-square value x sk = 2 x Expected value T 1 x=E x t lim x t dt T T 0 Variance: =E xt x 2 x 2 T 1 2 lim xt x dt T T 0 Probability function: Tx Pr x<x t x+x = lim T T Probability density function: Prx<xt x+x 1 Tx p( x)= lim = lim lim x 0 x 0 x T T x Most popular distributions: Standardised normal distribution: 2 x 1 2 e px 2 Most popular distributions: Normal distribution: x 2 px 1 2 e 2 2 probability density p(x) 0.4 0.3 0.2 0.1 0 -20 -10 0 argument x 10 20 probability density p(x) 0.4 0.3 0.2 0.1 0 -8 -4 0 4 argument x 8 12 probability density p(x) 0.4 0.3 0.68 0.2 0.1 0 -6 -4 -2 0 2 4 6 argument x probability density p(x) 0.4 0.3 0.95 0.2 0.1 0 -6 -4 -2 0 2 4 6 argument x Normal distribution – cumulative probability x P x p d cumulative probability P(x) 1 0.8 0.6 0.4 0.2 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 argument x Normal distribution - kwantyle xp Pr x x p p d P x p 3 2 1 xp x p P 0 -1 -2 -3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 cumulative probability P(xp) Most popular distributions: 2 Chi-square distribution ( ) 2 p 2 n ; n 2 1 1 2 1 2 e 2 ; - number of freedom degrees 1 n=1 probability density p( ) 0.8 0.6 n=2 n=3 0.4 n=10 0.2 0 0 4 8 12 16 20 argument 0.2 2 0.16 probability density 1 0.12 0.08 0.04 0 -5 0 5 10 15 20 Chi-square distribution – cumulative distribution n=2, 3, 4, ... 20 cumulative probability P( ) 12 8 4 0 0 4 8 12 16 20 argument Most popular distributions: t- Student distribution – probability density: 1 1 f 1 f 1 2 2 t 2 1 pt f 1 f f 2 f – number of freedom degrees 0.4 f=10 probability density p(t) f=2 0.3 f=1 0.2 0.1 0 -6 -4 -2 0 2 4 6 argument t t - Student distribution – cumulative probability: f = 2, 3, 4, ... 10 cumulative probability P(t) 1 0.8 0.6 0.4 0.2 0 -10 -8 -6 -4 -2 0 2 4 6 8 10 argument t Auto-correlation function. T 1 K x = lim xt xt dt T T 0 T 1 Rx = lim xt x xt x dt T T 0 0.6 auto-correlation Rx( ) 0.4 0.2 0 -0.2 -0.4 -0.6 0 1 2 argument 3 [s] 4 auto-correlation Rx( ) 1.2 0.8 0.4 0 -0.4 0 0.2 0.4 argument 0.6 [s] 0.8 1 1.8 auto-correlation Rx( ) 1.5 1.2 0.9 0.6 0.3 0 -0.3 -0.6 0 0.4 0.8 argument 1.2 [s] 1.6 2 Power spectral density: 1 Gx f lim f 0 f 1 2 x t , f , f dt Tlim T 0 S x f = Rx e - T j 2f d xt A1 sin 2 f1 2 Hz 1 A2 sin 2 f 2 6 Hz 2 5 power spectral density G(f) 2 0.5 A2 4 3 0.5 A12 2 1 0 0 2 4 6 8 10 frequency f [Hz] Joined density of probability function: amplitude x(t) 6 4 2 0 -2 x+dx -4 x amplitude y(t) -6 0 0.2 0.4 0 0.2 0.4 0.6 0.8 1 time [s] 0.6 0.8 1 time [s] 6 4 2 0 y+dy y -2 Txy T Txy 1 p x, y= lim lim x 0 x y T T y 0 Joined cumulative probability: x y P x, y= Pr x t x, y t y= p , d d - - 0.15 0.1 0.05 0 5 0 5 0 -5 -5 5 0 5 0 -5 -5 Cross – correlation: T 1 K xy = lim xt yt d T T 0 T 1 Rxy = lim xt x yt - y dt T T 0 yt 1. sin 21Hz t 1.3 randn0;1 6 signal amplitude 4 2 0 -2 -4 -6 0 1 2 time [s] 3 4 0.6 cross-correlation 0.4 0.2 0 -0.2 -0.4 -0.6 -2 -1 0 time [s] 1 2 Spectral density function: S xy jf = Rxy e - j 2 f d - 0.045 spectral density 0.04 0.035 ^ Sxy(jf) 0.03 0.025 ^ S x(f) 0.02 0.015 0.01 0.005 0 0 4 8 12 frequency [Hz] 16 20 H xy jf j 2f 2 S xy jf H xy jf S x f k n2 2 n j 2f n2 -4 0 -3 -2 Real[Hxy(jf)] -1 0 1 2 f=50 Hz 3 4 5 f=0 Hz -2 Imag[Hxy(jf)] Transfer function: -4 -6 -8 Hxy(jf) -10 6 Statistical Base of Data Analysis Mathematical statistics deals with gaining information from data. In practice, data often contain some randomness or uncertainty. Statistics handles such data using methods of probability theory. Mathematical statistics tests the distribution of the random quantities In applying statistics to a scientific, industrial, or societal problem, one begins with a process or population to be studied. This might be a population of people in a country, of crystal grains in a rock, or of goods manufactured by a particular factory during a given period. It may instead be a process observed at various times; data collected about this kind of "population" constitute what is called a time series. For practical reasons, one usually studies a chosen subset of the population, called a sample. Data are collected about the sample in an observational or measurement experiment. The data are then subjected to statistical analysis, which serves two related purposes: description and inference. •Descriptive statistics can be used to summarize the data, either numerically or graphically, to describe the sample. Basic examples of numerical descriptors include the mean and standard deviation. Graphical summarizations include various kinds of charts and graphs. •Inferential statistics is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population. These inferences may take the form of answers to yes/no questions (hypothesis testing), estimates of numerical characteristics (estimation), descriptions of association (correlation) or modelling of relationships (regression). Mathematical statistic hypothesis tests estimation theory parametric estimation point estimation non-parametric estimation interval estimation hypothesis tests A statistical hypothesis test, or more briefly, hypothesis test, is an algorithm to state the alternative (for or against the hypothesis) which minimizes certain risks. The only conclusion, which may be draw-out from the test is that •There is not enough evidence to reject the hypothesis. •Hypothesis is false. estimation theory Estimation theory is a branch of statistics and that deals with estimating the values of parameters based on measured/empirical data. The parameters describe the physical object that answers a question posed by the estimator. non-parametric estimation Nonparametric estimation is a statistical method that allows determination of the chosen characteristic, understood as a set of points in predefined coordinates system (without any functional description). parametric estimation Parametric estimation is a statistical method that allows determination of the chosen parameters, describing the analysed signal or object. point estimation In statistics, point estimation involves the use of sample data to calculate a single value (known as a estimate) which is to serve as a "best guess" for an unknown (fixed or random) population parameter. interval estimation In statistics, interval estimation is the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter. Statistical Base of Data Analysis Random quantities: A random event may appear or not as a result of experiment. A random variable is a function, which assigns unique numerical values to all possible outcomes of a random experiment under fixed conditions. A random variable is not a variable but rather a function that maps events to numbers. A stochastic process, or sometimes random process, is the opposite of a deterministic process. Instead of dealing only with one possible 'reality' of how the process might evolve under time, in a stochastic or random process there is some indeterminacy in its future evolution described by probability distributions. This means that even if the initial condition (or starting point) is known, there are many possibilities the process might go to, but some paths are more probable and others less All elements belonging to the defined set are called the general population. For instance: all citizens of the defined country. A sample population chosen sub-set of general population. general population sample population Estimator properties – ideal estimator. as variance bi er ro r Unbiased estimators: This means that the average of the estimates from an increasing number of experiments should converge to the true parameter values, assuming that the noise characteristics are constant during the experiments. A more precise mathematical description would be: An estimator is called „unbiased” if its expected value is equal to the true value. E 100.0 80.0 estimates 60.0 40.0 20.0 0.0 0 400 800 1200 number of samples 1600 Asymptotically unbiased estimator: Same estimators are biased, but in general expected value of an estimator should converge to the true value if the number of measurements increases to infinity. Again this can be formulated more carefully: An estimator is called „asymptotically unbiased” if ˆ N lim E N with N number of measurements. 3.50E-004 true value 3.00E-004 estimator 2.50E-004 2.00E-004 1.50E-004 1.00E-004 5.00E-005 0.00E+000 0 4 8 12 16 20 24 28 number of samples 32 36 40 44 Efficient estimators. Estimator with smaller root-mean error is called more efficient. 2 2 E ( k - ) E ( i - ) 2 2 ˆ = E ˆ + =E 2 2 2 ˆ ˆ E E b + 2 Consistent estimator. An estimator is called consistent if =0 lim Pr N for each >0 Robust estimator An estimator is called a robust estimator if its properties are still valid when the assumptions made in its construction are no longer applicable.