Download Practical

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Module 4 Practical 7
Practical 7
Estimation and Confidence Intervals
Objectives:
By the end of this practical you should be able to:




describe what is meant by the standard error of a sample estimate and
calculate the standard error for a sample mean
explain the meaning and interpretation of a confidence interval
calculate a 95% confidence interval for a population mean 
explain the effect that sample size and variability in the raw data have
on the width of the confidence interval
1. Open the Stata file named unhs_hh&poverty.dta . This file has household information from
UNHS2, with respect to several variables, some of which relate to education and others which
relate to poverty .
One question of interest was “What is the average size of a household in rural areas in the four
regions of Uganda”.
(a) Use the command summ hhsize if region==1 & rurban==0 in Stata to determine, for
rural households in the Central region, an estimate of the average number of members in a
household, and the corresponding standard deviation. Note them down in the first row of the
table below, and use this information to determine the standard error of your estimate of the
mean.
Region
n (rural)
Mean
Std. Deviation
Std. error = s/n
Central (region=1)
Eastern (region=2)
Northen (region=3)
Western (region=4)
Repeat your procedure above to obtain similar results for the Eastern, Northern and Western
regions of Uganda.
Districts Training Programme
Module 4 Practical 7 – Page 1
Module 4 Practical 7
(b) Use your results above to compute (by “hand”) a 95% confidence interval for the mean
number of persons per household in the Northern region. Obtain your t-value using Statistical
Tables or the Stata command display invttail(k,0.025) to obtain the upper-tail value from a tdistribution with k degrees of freedom.
(c) Now write down the answer to the question first proposed with respect to the Northern
region, i.e. “What is the average size of rural households in the Northern region of Uganda”,
attaching to your answer the confidence interval calculated above and interpreting carefully what
it means.
(d) Verify your answers above, and obtain 95% confidence intervals for the remaining 3 regions,
by using the Stata command ci hhsize if region==1 & rurban==0 . Enter your results for the
95%% confidence interval in the relevant column in table below.
Region
Central (region=1)
Eastern (region=2)
Northen (region=3)
Western (region=4)
n (rural)
90% conf. int.
95% conf. int.
99% conf. int.
Note that a 99% confidence interval (or other levels of confidence) can be obtained using:
ci hhsize if region==1 & rurban==0 , level(99)
Districts Training Programme
Module 4 Practical 7 – Page 2
Module 4 Practical 7
Use the above command to obtain 90% and 99% confidence intervals and enter them in the
table above. What can you say about the width of the confidence interval as the level of
confidence increases from 90% to 95% to 99%?
2. In this exercise, you will select samples of different sizes to explore the effect that sample size
has over the width (upper limit minus lower limit) of the confidence interval.
Open the Stata file called hhsize_samples.dta. This file has information on household size for
samples of different sizes drawn from rural households in the Central region in Uganda. The
columns names, i.e. sample10, sample20, sample50, sample100 and sample200, represent the number
of observations in the sample.
(a) For each sample, obtain the mean, standard deviation, standard error of the mean, and a
95% confidence interval for the mean. Note down the results below, then calculate (by “hand”)
results for the final column.
Sample
size
Mean
Standard
deviation
Std. error
95% C.I. for
true mean
Width of 95%
Conf. Interval
10
20
50
100
200
(b) Do you observe any trends in the standard error with increasing sample size? If so, can you
attribute reasons why this is so?
Districts Training Programme
Module 4 Practical 7 – Page 3
Module 4 Practical 7
(c) What can you say about the change in the width of the confidence interval as the sample size
increases? What reasons can you give for any changes you observe?
(d) With respect to one particular sample size, explain how you think your standard errors and
confidence intervals would change if the standard deviation was to increase. If it did, would your
estimate of the mean be less or more precise?
3. Consider again data from the file unhs_hh&poverty.dta . After restricting the data to rural
households in the Central region, results below (using the Stata command was proportion hsex)
were obtained for the proportion of male and female headed households. [Note: there is no need
to reproduce these results].
Proportion estimation
Number of obs
=
1520
-------------------------------------------------------------| Proportion
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------hsex
|
Female |
.2473684
.0110709
.2256525
.2690844
Male |
.7526316
.0110709
.7309156
.7743475
Interpret these results. In particular, how would you report results in answer to the question
“What is the proportion of female headed households in rural Central region of Uganda?”
Districts Training Programme
Module 4 Practical 7 – Page 4
Module 4 Practical 7
4. This final exercise is aimed at giving you further practice in deriving estimates and forming
confidence intervals, and interpreting these quantities.
EITHER work on a data set of your own to find estimates of key responses of interest, reporting
these estimates together with their standard errors, and also reporting confidence intervals for the
true population parameters,
OR use the file unhs_hh&poverty.dta to answer the question posed below for your own
district. Stata commands and steps needed for selecting data from Mukono district in the Central
region are given below to help you in selecting data from your own district.
Step 1: Use the command preserve (so you can return to the original data set by using the
command restore).
Step 2: Use the command keep if region==1 (if your district is in the Central region, else change
to 2, or 3 or 4 as appropriate).
Step 3: Use the command label list distlab to determine the code used in the variable dist for
your own district (Note: distlab is a label that holds label values for codes of variable dist). For
example, Mukono in the Central region will have code 109.
Step 4: Use the command keep if dist==109 (for restricting data to Mukono district). Replace
the code 109 by the district code for the district of your own choice.
Questions to answer with respect to data from your selected district:
(a) The variable log_welf refers to the logarithm of the household’s monthly consumption
expenditure per adult equivalent, used as a proxy for the household’s income. What is the mean
monthly consumption expenditure per adult equivalent for rural households in your own district?
(Your answer should also include measures of precision, e.g. standard error and confidence
interval).
Districts Training Programme
Module 4 Practical 7 – Page 5
Module 4 Practical 7
(b) The variable hlitrate refers to whether or not the household head is literate. What proportion
of urban households have literate household heads? (Again the appropriate answer should not
be only the estimate, but also include measures of precision).
What proportion of rural households have literate household heads?
What proportion of rural households have literate female household heads?
What proportion of rural households have literate male household heads?
Write a short paragraph that summarizes the results above for rural households?
Districts Training Programme
Module 4 Practical 7 – Page 6