Download Estimation, Standard Errors and Confidence

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Session 7
Standard errors, Estimation
and Confidence Intervals
1
Learning Objectives
By the end of this session, you will be able to
 explain what is meant by an estimate of a
population parameter, and its standard error

explain the meaning of a confidence interval

calculate a confidence interval for the population
mean using sample data, and state the
assumptions underlying the above calculation
2
Reminder: What is inference?

Inference is about drawing conclusions concerning
population characteristics using information gathered
from the sample

It is assumed that the sample is representative of the
population

A further assumption is that the sample has been
drawn as a simple random sample from an infinite
population
3
Estimation
Population
Mean
Variance
Std. deviation



2

Sample
x
s2
s
Population characteristics (parameters) are denoted
by greek letters, sample values by latin letters
Sample characteristics are measurable and form
estimates of the population values.
4
Example of statistical inference

What is the mean number of persons
per household in Mukono district?

Data from 80 households surveyed in this
district gave a mean household size of 5.6
with a standard deviation 3.30.

Hence our best estimate of the mean household
size in Mukono district is therefore 5.6.
What results are likely if we sampled
again with a different set of households?
5
Example using Stata
Open Stata file UNHS_hh&poverty.dta
Numeric
code is
109 for
Mukono
6
Use summarize dialogue
Type db summarize or use menu
Statistics  Summaries, tables  Summaries  Summary Statistics
Variable hhsize
Then use by/if/in tab
7
dist ==109 is condition
Results
Summaries for whole sample
Summaries for Mukono only
8
The distribution of means

Suppose 10 University students were given a
standard meal and the time taken to consume
the meal was recorded for each.

Suppose the 10 values gave:
mean = 11.24, with std.dev.= 0.864

Let’s assume this exercise was repeated 50 times
with different samples of students

A histogram of the resulting 500 obs. appears below,
followed by a histogram of the 50 means from each
sample
9
Histogram of raw data
The data appear
to follow a
normal
distribution
10
Histogram of 50 sample means
The distn of the
sample means
is called its
Sampling
Distribution
Notice that the
variability of the
above distn is
smaller than the
variability of the
raw data
11
Back to estimation…
The estimate of the mean household size in
Mukono district was 5.6.
Is this sufficient for reporting purposes, given that this
answer is based on one particular sample?
What we have is an estimate based on a sample of size
80. But how good is this estimate?
We need a measure of the precision, i.e. variability, of
this estimate…
12
Sampling Variability
The accuracy of the sample mean x as
an estimate of  depends on:
(i) the sample size (n)
since the more data we collect, the more
we know about the population, and the
(ii) inherent variability in the data  2
These two quantities must enter the measure
of precision of any estimate of a population
parameter. We aim for high precision, i.e.
low standard error!
13
Standard error of the mean
Precision of x as estimate of  is given by:
the standard error of the mean.
s.e.  x   

n
Also written as s.e.m., or sometimes s.e.
It is estimated using the sample data: s/n
For example on household size,
s.e.=3.298/80 = 3.298/8.944 = 0.369
14
Confidence Interval for 
Instead of using a point estimate, it is usually more
informative to summarise using an interval which is
likely (i.e. with 95% confidence) to contain .
This is called an interval estimate or a
Confidence Interval (C.I.)
For example, we could report that the mean household
size of HHs in Mukono district is 5.6 with 95% confidence
interval (4.87, 6.33), i.e. there is a 95% chance that the
interval (4.87,6.33) includes the true value .
15
Analysis using Stata
Type db ci or use menu
Use the by/if/in
tab as before
16
Results
For whole sample
Just for Mukono
17
Finding the Confidence Interval
The 95% confidence limits for  (lower and upper)
are calculated as:
x  tn1 ( s
n)
and
x  tn1 ( s
where tn-1 is the 5% level for
the t-distribution with (n-1)
degrees of freedom.
2½%
Statistical tables and statistical
–t
software give t-values.
n)
2½%
0
t
18
t-values for finding 95% C.I.
P
2
3
4
5
10
6.31
2.92
2.35
2.13
2.02
5
12.7
4.30
3.18
2.78
2.57
2
31.8
6.96
4.54
3.75
3.36
6
7
8
9
10
1.94
1.89
1.86
1.83
1.81
2.45
2.36
2.31
2.26
2.23
3.14
3.00
2.90
2.82
2.76
20
30
40
60
1.72
1.70
1.68
1.67
2.09
2.04
2.02
2.00
2.53
2.46
2.42
2.39
1.64
1.96
2.33 19
=1
x  tn1 ( s
2½%
–t
n)
2½%
0
t

Correct interpretation of C.I.s
If we sampled repeatedly and found a
95% C.I. each time, only 95% of them
would include the true , i.e. there is a 95%
chance that a single interval would include .
13
12
11
10
0
5
10
15
20
25
30
35
40
45
50
20
An example (persons per HH)
For rural households (n=40) in Mukono, we
find mean=6.43, std.dev.=3.54 for the number
of persons per household.
Hence a 95% confidence interval for the true mean
number of persons per household:
6.43  t39 (s/n) = 6.43  2.02(3.54/40)
= 6.43  1.13
= (5.30, 7.56)
Can you interpret this interval? Write down your
answer. We will then discuss.
21
Analysis in Stata
Press Page Up to retrieve the last command
Then add “& rurban == 0” to the condition
Or use the menus and change the dialogue
22
Underlying assumptions
The above computation of a confidence interval
assumes that the data have a normal distribution.
More exactly, it requires the sampling distribution of
the mean to have a normal distribution.
What happens if data are not normal?
Not a serious problem if sample size is large
because of the Central Limit Theorem, i.e. that the
sampling distribution of the mean has a normal
distribution, for large sample sizes.
23
Assumptions - continued
So even when data are not normal, the formula for a
95% confidence interval will give an interval whose
“confidence” is still high - approximately 95%.
It is better to attach some measure of uncertainty
than worry about the exact confidence level.
24