Download Practice Problems for Exam 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Transcript
ST 432
Exam 1 Practice Questions
1. A dentist was interested in the effectiveness of a new toothpaste. Records showed
there was an average of 2.2 cavities every six months for a group of 1000
schoolchildren before they started using the new toothpaste. After six months of
the study, the dentist sampled n=10 children to determine how they were
progressing on the new toothpaste. The data follow.
Child
1
Cavities 0
2
4
3
2
4
3
5
2
6
0
7
3
8
4
9
1
10
1
a. Assume the 10 children were selected via SRS from the group of 1000.
Estimate the mean number of cavities for the whole group of 1000 and
place a bound on the error of your estimate.
y  2 and bound on error of estimation =1.06 (t) or 0.92(z)
b. The dentist would like to calculate a confidence interval to see whether the
population mean has changed since before the study began. Would you
recommend a confidence interval based on a normal approximation for
this purpose? Why or why not?
No, the sample size is too small.
2. A local newspaper reporter is doing a story about NCSU faculty’s off-campus
consulting. This reporter is specifically interested in how much money faculty
members earn from such activities. The reporter obtains a list of all 1588 faculty
members of the university and chooses a simple random sample (SRS) of 10
people. He mails each of the ten people a survey in which he asks, among other
things, how much money each person has earned through off-campus consulting
activities during the previous calendar year. All ten people respond to the survey.
The data (in dollars) follow.
12,000
250
750
8500
3200
a. What is the target population?
All NCSU faculty
b. What is the sampling frame?
0
300
0
2500
3000
List of all 1588 faculty
c. What are the sampling units?
Individual faculty members
d. What kind of measurement bias might occur in the survey?
Individuals may not remember correctly, may misunderstand what
is considered consulting, or may lie.
e. Using an unbiased estimator, estimate the population mean and give a
bound on the error of estimation.
y  3050 , bound on error of estimation =2906 (t) or 2518 (z)
f. The reporter’s editor says that his standard error is too large to be useful.
To get a smaller standard error, the reporter proposes to take a larger
sample. Using the list of 1578 faculty that were not previously selected,
he will take an SRS of 10 more faculty and send them surveys. Could
these 10 new observations be combined with the 10 previous observations
(given above) and treated as an SRS of size 20 from the population of all
1588 faculty? Explain.
Yes. Each subset of 20 unique units from the population has an
equal chance of being selected as the final sample.
3. Researchers are interested in studying the number of oak trees in a wilderness
region. The region is divided into 50 plots, 10 of which are selected for the
sample via SRS. For each selected plot, the number of oak trees on the plot is
counted. The sample data appear below.
Plot
3
8
9
13
25
Oak Trees
42
1
5
28
12
Plot
27
32
39
43
49
Oak Trees
38
15
8
41
3
a. Calculate an unbiased estimate of the total number of oak trees in the
region and place a bound on the error of estimation.
ˆ  965 ; bound on the error of estimation=524 (t) or 454 (z)
4. Below is a population of five units. The study variable is denoted y.
Unit
1
2
3
4
5
y
5
12
8
4
6
Consider the following sampling design.
Sample S P(S)
{1,4}
¼
{2,3}
¼
{5}
½
a. We want to use the sample mean to estimate the population mean. Is the
sample mean an unbiased estimator of the population mean under this
sampling design?
No. E( y )  1/ 4(4.5)  1/ 4(10)  1/ 2(6)  6.625    7
b. What is the standard deviation of the sample mean under this sampling
design?
V ( y )  1/ 4(4.5  6.625) 2  1/ 4(10  6.625) 2  1/ 2(6  6.625) 2  4.01
So SD=2.002
c. What is the inclusion probability  i for each unit under this sampling
design?
1   2   3   4  1/ 4 and  5  1 / 2
d. Is this a probability sampling design?
Yes
5. Consider the following small population of people and their heights.
Name
Ann
Barry
Chip
Denise
Edward
Height (inches)
62
68
76
72
73
The following sampling design will be used.
S
{Ann, Barry}
{Chip, Denise}
{Ann, Denise, Edward}
{Barry, Chip}
P(S)
1/4
1/4
1/8
3/8
a. What is the population mean  ?
70.2
b. What is the population variance  2 ?
23.36
c. Is this a fixed size sampling design? Explain.
No, some samples are size 2, some are size 3.
d. Is this a probability sampling design? Explain.
Yes, every member of the population appears in at least one
sample.
e. What is the mean of the sampling distribution of y ? Is y unbiased for
?
E ( y )  70.375 . No, not unbiased.
f. What is the standard deviation of y ?
V ( y )  (65  70.375) 2
1
1
1
3
 (74  70.375) 2  (69  70.375) 2  (72  70.375)2  11.73
4
4
8
8
SD=3.43
g. What is the inclusion probability  i for each member of the population?
1  3 / 8; 2  5 / 8; 3  5 / 8; 4  3 / 8; 5  1/ 8
h. Would you recommend this sampling design over an SRS of size 2?
Explain.
If y is going to be used, at least it would be unbiased under SRS
and its SD would be 2.96. It does appear that SRS would be better
than using y with this sampling design.
6. The statistics department wants to estimate the average housing cost for a
graduate student in order to give this information to prospective students. From a
list of the 150 currently supported statistics graduate students (those receiving pay
for an assistantship or fellowship), a SRS of 10 students is selected.
a. What is the target population in this study?
All prospective graduate students.
b. What is the sampled population in this study?
All currently supported graduate students.
c. Are there any possible sources of selection bias? Explain.
Those graduate students who are unsupported have no chance of
ending up in the sample.
d. Give one possible source of measurement bias.
Students may misunderstand what should be included in “housing
costs.”
e. The collected data are given below. Using an unbiased estimator, give an
estimate of the population mean and the corresponding bound on the error
of estimation.
Student
1
2
3
4
5
6
7
8
9
10
Housing cost per month
(hundreds of dollars)
4.5
5.5
7.4
0
3.6
3.5
2.5
3.5
3.7
4.0
y  3.82  $382
bound on the error of estimation  1.32($132) (t ) or 1.14($114) ( z)
7. Selection bias can occur in several ways:
i)
Using a sample-selection procedure that, unknown to the investigators,
depends on some characteristic associated with the properties of interest;
ii)
Deliberately or purposefully selecting a “representative” sample;
iii)
Misspecifying the target population;
iv)
Failing to include all the target population in the sampling frame, called
undercoverage;
v)
Substituting a convenient member of the population for a designated
member that is not readily available;
vi)
Failing to obtain responses from all the chosen sample (nonresponse);
vii)
Allowing the sample to consist entirely of volunteers.
For each of the following scenarios, state whether you believe the survey was
subject to selection bias and explain why or why not. [Answer keys to these are
not provided.]
a. A survey by a company was said to measure American attitudes toward
college. The polling firm mailed questionnaires to 4200 people across the
US and received 1188 responses. About 55% of those responding said
they had major concerns about being able to pay for their child’s college
education.
b. A chain of department stores is interested in estimating the proportion of
accounts receivable that are delinquent. The chain consists of four stores,
each having 100 accounts. Fifty accounts are chosen from store 1, 10
from store 2, 10 from store 3, and 10 from store 4.
c. You are hired to estimate the proportion of registered voters in your
county who favor an increase in the number of nuclear weapons owned by
the US. You attend one meeting of the Republican party in your county
and one meeting of the Democratic party in your county and survey all
attendees at each meeting. For a final estimate, you average the
percentages from the two meetings.
d. An investigator is curious about a community’s attitude toward tougher
credentials for day care workers proposed by the state legislature. The
investigator obtains a list of all licensed day care facilities in the
community. A sample of facilities is selected, then a sample of parents at
each selected facility is taken. In presenting the results to a local
community action group, the investigator states, “A survey of parents of
children in licensed day care facilities in this community indicates that
well over 75% are in favor of tougher credentials for day care workers.”
8. Suppose we want to estimate the proportion p of recipes in the Better Homes and
Gardens New Cook Book that do not involve animal products. We plan to take an
SRS of the N = 1251 recipes in the book.
a. We want to be 95% confident that we are within 0.03 of the true
proportion p. What is the required sample size?
z 2 p(1  p)
; since we have no prior information about the
(MOE )2
1
proportion p, we will use p  . So
2


1
1
1.962  1  
2 2
n0 
 1067.
(.03)2
The sample size, ignoring the fpc, is large compared with the
population size, so in this case we should make the fpc
adjustment and use
n
1067
n 0 
 576.
n0
1067
1

1
1251
N
n0 
b. Suppose we want to reduce the margin of error by half, that is, we want to
be 95% confident that we are within 0.015 of the true proportion p. What
is the required sample size?
Since we want to reduce the margin of error by half,
new n0  4  original n0  4(1067)  4268;
newn0
4268
newn 

 967
newn0
4268
1
1
1251
N
c. Suppose we want to be 98% confident that we are within 0.03 of the true
proportion p. What is the required sample size?
1 1
2.332  1  
2
2 2
z p(1  p)
n0 

 1508.
2
(MOE )
(.03)2
Again, the sample size, ignoring the fpc, is large compared with the
population size (in fact, it's larger than the population size N),
so in this case we should make the fpc adjustment and use
n
1508
n 0 
 684.
n0
1508
1
1
1251
N