Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability and Estimation Probability and Statistics • Thought experiment – Know the population – Will sample n members at random • What can we say about probability of sample statistics? – p(M = x) for some value x – p(M1 > M2) – etc. • All hypothetical – Based on assumptions about population – Worked out mathematically – Not based on any real data • Logic behind all inferential statistics – IF the population has a certain distribution, what will the probabilities be? Marbles • Half red, half blue. p(red) = 1/2 ? • 2/3 red, 1/3 blue. p(red) = 2/3 ? • Probability is just a bag of marbles – Population = bag – Sample = drawing from bag • Sample four marbles – p(2 reds, 2 blues) = 6/16 ? = .375 Probability Distributions • Probability distribution – Distribution defined only by probabilities (not frequencies) – Can think of as infinite population • Discrete vs. continuous probability distributions – Discrete: like histogram, but p(x) instead of f(x) • Probability is height of each bar – Continuous: density function 0.40 • Random variable Probability • Probability is area under the curve 0.30 – Variable defined by probability distribution 0.20 – Can take on one of many values – Each value (or range of values) 0.10 has a probability 0.00 0 1 2 3 Number of reds 4 Expected Value • Expected value Mean of a probability distribution E ( R) x p( R x) x p( x) x x Probability p(R = x) 0.40 x 0 1 2 3 4 0.30 0.20 0.10 0.00 0 1 2 3 R (number of reds) 4 p(x) .0625 .25 .375 .25 .0625 xp(x) 0 0.25 0.75 0.75 0.25 = 2.00 Properties of Expected Value • Multiplying by a fixed number – I give you $3 per red: E(3R) = 3E(R) ? – E(cR) = cE(R) • Adding a fixed number – I give you $1 per red, plus a bonus $1: E(R + 1) = E(R) ? +1 – E(R + c) = E(R) + c • Sum of two random variables – E(R + G) = E(R) ? + E(G) • Relationship to mean – Expected value of any random variable equals its mean E ( R) x p ( R x) x mean( R) Sampling from a Population • Sampling a single item (n = 1) – What is E(X)? x æ ö E ç å X ÷ = E ( X[1] + ...+ X[n]) è sample ø = E(X[1]) + ...+ E(X[n]) • Sampling many items æ ö – What is E ç å X ÷ ? è sample ø = m + m + ...+ m = n × m æ ö æ å X ö Eç å X ÷ è sample ø n × m sample ç ÷ E = = =m ç n ÷ n n è ø 0.40 – What is E(mean(X))? 0.30 E(M) = Probability p(X = x) E(X) = å x × p(x) = m 0.20 • Unbiased estimators 0.10 – Statistic a is unbiased estimator of parameter a if E(a) = a – Sample mean is 0.00 unbiased estimator of population mean 0 1 2 x 3 4 Biased Estimators: The Example of Variance • Population variance – (X-)2 is unbiased estimator of s2 s2 = å ( X - m) 2 pop N æ å (X -m)22 ö å (X -m) • Many observations: 2 2 ç ÷ sample E = s = mean ( X m) n – Problem: We don't know ç n ÷ è ø å (X - M) 2 = E ( X - m) 2 ( sample n ( ? sample • å ( X - M )2 sample ) å (X - M) 2 • Sample mean always shifted toward sample (X – )2 is biased estimate of n s2 – Not a good definition for sample variance M ) n < å (X - m) 2 sample n )22ö æ åå(X(X-M -M ) sample E çç n ÷÷ < s 2 è n ø Sample Variance • Goal: Define sample variance to be unbiased estimator of population variance – E(s2) = s2 • Problem: Obvious answer is biased )22ö æ åå(X(X-M -M ) – E çsample n . ÷ < s 2 ç n ÷ è ø – M is always closer to X than is • Solution: – s. 2 = å (X -M )2 sample n-1 – Unbiased: E(s2) = s2 • Sample standard deviation s= å (X -M )2 sample n-1 Take-away Points • Random variables and probability distributions • Expected value – Formula: E ( R) x p( R x) x – Relationship to mean – Adding and multiplying • Samples and sample statistics as random variables • Biased and unbiased estimators – M is an unbiased estimator of : E(M) = å (X -M )2 – sample is a biased estimator of s2 n • Sample variance å (X -M )2 – Formula: s2 = sample n-1 – Unbiased estimator of population variance • Sample standard deviation: s= å (X -M )2 sample n-1