Download Lecture5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Sampling and estimation
Petter Mostad
2005.09.26
The normal distribution
• The most used continuous probability
distribution:
– Many observations tend to approximately
follow this distribution
– It is easy and nice to do computations with
– BUT: Using it can result in wrong conclusions
when it is not appropriate
0.4
0.3
0.2
0.1
μ+2σ
0.0
μ-2σ
-4
-2
0
2
The mean μ
4
The normal distribution
• The probability density function is
f ( x) 
•
•
•
•
1
2 2
e
 ( x   )2 / 2 2
where E ( X )   Var ( X )   2
Notation N (  ,  2 )
Standard normal distribution N (0,1)
Using the normal density is often OK unless
the actual distribution is very skewed
Normal probability plots
Normal Q-Q Plot of Household income in thousands
4
3
2
Expected Normal
• Plotting the quantiles
of the data versus the
quantiles of the
distribution.
• If the data is
approximately
normally distributed,
the plot will
approximately show a
straight line
1
0
-1
-2
-3
-200
0
200
400
600
Observed Value
800
1 000
1 200
The Normal versus the Binomial
distribution
• When n is large and π is not too close to 0 or 1,
then the Binomial distribution becomes very
similar to the Normal distribution with the same
expectation and variance.
• This is a phenomenon that happens for all
distributions that can be seen as a sum of
independent observations.
• It can be used to make approximative
computations for the Binomial distribution.
The Exponential distribution
• The exponential distribution is a distribution for
positive numbers (parameter λ):
f (t )  e t
• It can be used to model the time until an event,
when events arrive randomly at a constant rate
E (T )  1/ 
Var (T )  1/  2
Sampling
• We need to start connecting the probability models
we have introduced with the data we want to
analyze.
• We (usually) want to regard our data as a simple
random sample from a probability model:
– Each is sampled independently from the other
– Each is sampled from the probability model
• Thus we go on to study the properties of simple
random samples.
Example: The mean of a random
sample
• If X1,X2,…,Xn is a random sample, then
their sample mean is defined as
1 n
X
X

n
i 1
i
• As it is a function of random variables, it is
a random variable.
• If E(Xi)=μ, then E ( X )  
• If Var(Xi)=σ2, then
2
Var ( X ) 
n
E( X )  4
Var ( X )  3.2 /10 0.32
0.15
0.10
0.05
E ( X i )  20  0.2  4
Var ( X i )  20  0.2  (1  0.2)  3.2
0.00
• Assume X1,X2,…,X10 is
a random sample from
the binomial
distribution Bin(20,0.2)
• We get
0.20
Example
0
5
10
15
20
Simulation
• Simulation: To generate outcomes by computer, on
the basis of pseudo-random numbers
• Pseudo-random number: Generated by an
algorithm completely unrelated to the way numers
are used, so they appear random. Usually
generated to be uniformly distributed between 0
and 1.
• There is a correspondence between random
variables and algorithms to simulate outcomes.
Examples
• To simulate outcomes 1,2,…,6 each with
probability 1/6: Simulate pseudo-random u
in [0,1), and let the outcome be i if u is
between (i-1)/6 and i/6.
• To simulate exponentially distributed X
with parameter λ: Simulate pseudo-random
u in [0,1), and compute x=-log(u)/λ
35
10
0.4
15
20
0.6
25
0.8
30
n=100
0
5
0.2
0.0
2
4
6
8
0
10
1
2
3
4
5
300
40000
400
0
n=100000
10000
100
20000
200
30000
n=1000
0
0
The histogram
of n simulated
values will
approach the
probability
distribution
simulated
from, as n
increases
1.0
Stochastic variables and
simulation of outcomes
0
2
4
6
8
0
2
4
6
8
10
Using simulation to study properties
of samples
• We saw how we can find theoretically the
expectation and variance of some functions
of a sample
• Instead, we can simulate the function of the
sample a large number of times, and study
the distribution of these numbers: This gives
approximate results.
20000
10000
5000
0
Frequency
• X1,X2,…,X10 is a random
sample from the binomial
distribution Bin(20,0.2)
• Simulating these 100000
times, and computing X ,
we get
• The average of these
100000 numbers is 4.001,
the variance is 0.3229
30000
Example
2
3
4
5
6
7
Studying the properties of averages
• If X1,X2,…,Xn is a random sample from some
distribution, it is very common to want to study
the mean
• In the following example, we have sampled from
the Exponential distribution with λ parameter 1:
– First (done 10000 times) taken average of 3 samples
– Then (done 10000 times) taken average of 30 samples
– Then (done 10000 times) taken average of 300 samples
4000
1.0
Average
of 3
2000
0.0
0
0.2
1000
0.4
Frequency
0.6
3000
0.8
Exp. distr; λ=1
1.0
1.5
2.5
0
3.0
1
2
3
4
5
Average
of 300
2500
1500
500
500
1000
1000
Frequency
2000
1500
Average
of 30
0
0
Frequency
2.0
3000
0.5
2000
0.0
0.5
1.0
1.5
0.8
0.9
1.0
1.1
1.2
The Central Limit Theorem
• It is a very important fact that the above happens
no matter what distribution you start with.
• The theorem states: If X1,X2,…,Xn is a random
sample from a distribution with expectation μ and
variance σ2, then
X 
Z
/ n
approaches a standard normal distribution when n
gets large.
Example
• Let X be from Bin(n,π):
• X/n can be seen as the average over n Bernoulli
variables, so we can apply theory
• We get that when n grows, the expression
x
n

 (1   ) / n
gets an approximate standard normal distribution
N(0,1).
• A rule for when to accept the approximation:
n (1   )  9
The sampling distribution of the
sample variance
• Recall: the sample variance is S  i 1 ( X i  X )2
• We can show theoretically that its expectation is
equal to the variance of the original distribution
• We know that its distribution is approximately
normal if the sample is large
• If the underlying distribtion is normal N(μ,σ2):
2
4
2

– Var ( S 2 ) 
n 1
2
(
n

1)
S
–
is distributed as the
2

2
n 1
1
n 1
n
distribution
The Chi-square distribution
0.05
0.10
0.15

2
4
0.00
• The Chi-square
distribution with n
degrees of freedom is
2
denoted  n
• It is equal to the sum
of the squares of n
independent random
variables with
standard normal
distributions.
0
2
4
6
8
10
Estimation
• We have previously looked at
– Probability models (with parameters)
– Properties of samples from such probability models
• We now turn this around and start with a dataset,
and try to find a probability model fitting the data.
• A (point) estimator is a function of the data, meant
to estimate a parameter of the model
• A (point) estimate is a value of the estimator,
computed from the data
Properties of estimators
• An estimator is unbiased if its expectation is
equal to the parameter it is estimating
• The bias of an estimator is its expectation
minus the parameter it is estimating
• The efficiency of an unbiased estimator is
measured by its variance: One would like to
have estimators with high efficiency (i.e.,
low variance)
Confidence intervals: Example
• Assume μ and σ2 are some real numbers, and
assume the data X1,X2,…,Xn are a random sample
from N(μ,σ2).
– Then
X 
Z
~ N (0,1)
/ n
P(1.96  Z  1.96)  95%
– thus


P
(
X

1.96



X

1.96
)  95%
– so
n
n


(
X

1.96
,
X

1.96
) is a
and we say that
n
n
confidence interval for μ with 95% confidence, based
on the statistic X
Confidence intervals: interpretation
• Interpretation: If we do the following a large
number of times:
– We pick μ (and σ2)
– We generate data and the statistic X
– We compute the confidence interval
then the confidence interval will contain μ roughly 95%
of the time.
• Note: The confidence interval pertains to μ (and
σ2), and to the particular statistic. If a different
statistic is used, a different confidence interval
could result.
Example: a different statistic
• Assume in the example above we use
instead of Z  X   .
Z0 
/ n
• We then get Z0 ~ N (0,1) as before, and the
confidence interval ( X1 1.96 , X1 1.96 )
• Note how this is different from before, as
we have used a different statistic.
X1  

Alternative concept:
Credibility interval
• The knowledge about μ can be formulated
as a probability distribution
• If an interval I has 95% probability under
this distribution, then I is called a credibility
interval for μ, with credibility 95%
• It is very common, but wrong, to interpret
confidence intervals as credibility intervals
Example: Finding credibility
intervals
• We must always start with a probability
distribution π(μ) describing our knowledge
about μ before looking at data
• As above, the probability distribution g for
Z|μ is the normal distribution N(μ,σ2/n)
• Using Bayes formula, we get a probability
distribution f for μ|Z:
g ( Z |  ) (  )
f ( | Z ) 
P( Z )
Finding credibility intervals (cont.)
• IF we assume ”flat” knowledge about μ
before observing data, i.e., that π(μ)=1, then
 | Z ~ N ( X ,  2 / n)
and a credibility interval becomes
( X  1.96
, X  1.96
n


n
)
• Similarily, if we assume π(μ)=1 and only
observe X1, then a credibility interval
becomes ( X  1.96 , X  1.96 )
Summary on confidence and
credibility intervals
• Confidence and credibility intervals are NOT the
same.
• A confidence interval says something about a
parameter AND a random variable (or statistic)
based on it.
• A credibility interval describes the knowledge
about the parameter; it must always be based also
on a specification of the knowledge before making
the observations, as well as the observations
• In many cases, computed confidence intervals
correspond to credibility intervals with a certain
prior knowledge assumed.