Download 09-15 lecture

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Probability and Estimation
9/15
Probability and Statistics
• Thought experiment
– Know the population
– Will sample n members at random
• What can we say about probability of sample statistics?
– p(M = x) for some value x
– p(M1 > M2)
– etc.
• All hypothetical
– Based on assumptions about population
– Worked out mathematically
– Not based on any real data
• Logic behind all inferential statistics
– IF the population has a certain distribution, what will the
probabilities be?
Marbles
• Half red, half blue. p(red) = 1/2
?
• 2/3 red, 1/3 blue. p(red) = 2/3
?
• Probability is just a bag of marbles
– Population = bag
– Sample = drawing from bag
• Sample four marbles
– p(2 reds, 2 blues) = 6/16
? = .375
Probability Distributions
• Probability distribution
– Distribution defined only by probabilities (not frequencies)
– Can think of as infinite population
• Discrete vs. continuous probability distributions
– Discrete: like histogram, but p(x) instead of f(x)
• Probability is height of each bar
– Continuous: density function
0.40
• Random variable
Probability
• Probability is area under the curve
0.30
– Variable defined by probability distribution
0.20
– Can take on one of many values
– Each value (or range of values) 0.10
has a probability
0.00
0
1
2
3
Number of reds
4
Expected Value
• Expected value
Mean of a probability distribution
E ( R)   x  p( R  x)
   x  p( x)
x
x
Probability
p(R = x)
0.40
x
0
1
2
3
4
0.30
0.20
0.10
0.00
0
1
2
3
R (number of reds)
4
p(x)
.0625
.25
.375
.25
.0625
xp(x)
0
0.25
0.75
0.75
0.25
 = 2.00
Properties of Expected Value
• Multiplying by a fixed number
– I give you $3 per red: E(3R) = 3E(R)
?
– E(cR) = cE(R)
• Adding a fixed number
– I give you $1 per red, plus a bonus $1: E(R + 1) = E(R)
? +1
– E(R + c) = E(R) + c
• Sum of two random variables
– E(R + G) = E(R)
? + E(G)
• Relationship to mean
– Expected value of any random variable equals its mean
E ( R)   x  p ( R  x)
x
 mean( R)
Sampling from a Population
• Sampling a single item (n = 1)
– What is E(X)?
x
æ
ö
E ç å X ÷ = E ( X[1] + ...+ X[n])
è sample ø
= E(X[1]) + ...+ E(X[n])
• Sampling many items
æ
ö
– What is E ç å X ÷ ?
è sample ø
= m + m + ...+ m = n × m
æ
ö
æ å X ö Eç å X ÷
è sample ø n × m
sample
ç
÷
E
=
=
=m
ç n ÷
n
n
è
ø
0.40
– What is E(mean(X))?
0.30
 E(M) = 
Probability
p(X = x)
E(X) = å x × p(x) = m
0.20
• Unbiased estimators
0.10
– Statistic a is unbiased estimator of parameter a if E(a) = a
– Sample mean is 0.00
unbiased estimator of population mean
0
1
2
x
3
4
Biased Estimators: The Example of Variance
• Population variance
– (X-)2 is unbiased estimator of s2
s2 =
å ( X - m) 2
pop
N
æ å (X -m)22 ö
å (X -m)
• Many observations:
2
2
ç
÷
sample
E
=
s
=
mean
(
X
m)
n
– Problem: We don't know 
ç n ÷
è
ø
å (X - M) 2
= E ( X - m) 2
(
sample
n
(
?
sample

•
å ( X - M )2
sample
)
å (X - M) 2
• Sample mean always shifted toward sample
(X – )2
is biased estimate of
n
s2
– Not a good definition for sample variance
M
)
n
<
å (X - m) 2
sample
n
)22ö
æ åå(X(X-M
-M )
sample
E çç n ÷÷ < s 2
è n ø
Sample Variance
• Goal: Define sample variance to be unbiased estimator
of population variance
– E(s2) = s2
• Problem: Obvious answer is biased
)22ö
æ åå(X(X-M
-M )
– E çsample n . ÷ < s 2
ç n ÷
è
ø
– M is always closer to X than  is
• Solution:
– s. 2
=
å (X -M )2
sample
n-1
– Unbiased: E(s2) = s2
• Sample standard deviation
s=
å (X -M )2
sample
n-1
Take-away Points
• Random variables and probability distributions
• Expected value
– Formula: E ( R)   x  p( R  x)
x
– Relationship to mean
– Adding and multiplying
• Samples and sample statistics as random variables
• Biased and unbiased estimators
– M is an unbiased estimator of : E(M) = 
å (X -M )2
– sample
is a biased estimator of s2
n
• Sample variance å (X -M )2
– Formula:
s2 =
sample
n-1
– Unbiased estimator of population variance
• Sample standard deviation:
s=
å (X -M )2
sample
n-1
Related documents