Download Final Exam Review Sheet

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Mean field particle methods wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Final Exam Review Key
Directions: Answer the following questions using the formulas on page seven as a reference.
The 2002 GSS provides the following statistics for the average years of education and
associated standard deviations for lower, working, middle, and upper class respondents. Use
this data to answer the questions 1 – 4, assuming that years of education are normally
distributed.
Table 1. Mean and Std Dev for Education by Class
Mean Standard Deviation
N
Lower Class
10.34
3.18
89
Working Class
12.64
2.50
675
Middle Class
14.19
2.39
675
Upper Class
15.00
3.27
49
Source: GSS 2002
1. What proportion of working class respondents have 12 to 16 years of education?
Always start with a picture; since 12 is below the mean and 16 is above the mean, we
are looking for the following area:
To calculate the Z score, we use the formula:
Z=
Y Y
SY
The Z score for a value of 12 is:
12.00  12.64
2.50
 0.64
=
2.50
Z
= -0.26
The Z score for a value of 16 is
16.00  12.64
2.498
3.36
=
2.498
= 1.34
Z
The area between a Z of -0.26 and the mean is 0.1026. The area between a Z of 1.34
and the mean is 0.4099, so the total area between the scores is
Area  .1026  0.4099  0.5125
So, the proportion of working class respondents with 12-16 years of education is
0.5125.
2. What proportion of upper class respondents have 12 to 16 years of education?
Again since 12 is below the mean and 16 is above the mean, we are looking for the
following area:
The Z score for a value of 12 is:
12.00  15.00
3.27
 3.00
 0.92
=
3.27
= – 0.92
Z
The Z score for a value of 16 is
2
16.00  15.00
 0.31
3.27
1.00
=
3.27
= 0.31
Z
The area between a Z of -0.92 and the mean is .3212. The area between a Z of
0.31 and the mean is 0.1217, so the total area between the scores is
Area  0.3212  .1217  0.4429
So, the proportion of upper class respondents with 12-16 years of education is
0.4429.
3. What is the probability that a working class respondent, drawn at random from the
population, will have more than 16 years of education?
To answer this question, we have to standardize the raw score and find the area
beyond the Z score. We should start with a picture:
The Z score for a value of 16 is
16.00  12.64
2.50
3.36
=
2.50
= 1.34
Z
The area between a Z of 1.34 and the tail of the distribution (Column C) is .0901. So,
the probability of a working class respondent having more than 16 years of education
is 0.0901.
3
4. What is the probability that a middle class respondent, drawn at random from the
population, will have less than 12 years of education?
To answer this question, we have to standardize the raw score and find the area
beyond the Z score. We should start with a picture:
The Z score for a value of 12 is
12.00  14.19
2.39
 2.19
2.39
= -0.91
Z
The area between a Z of 0.91 and the tail of the distribution (Column C) is 0.1814. So,
the probability of a middle class respondent having less than 12 years of education is
0.1814.
4
The mean family income of a large southern city is $34,000, with a standard deviation (for
the population) of $5,000. Imagine that you took a sample of 200 city residents and you
calculated the mean income for that sample. Answer questions 5 – 6 based on this scenario.
5. What is the probability that your sample mean is between $33,000 and $34,000?
Since the mean is $34,000, we want to find the area under the curve between $33,000
and $34,000. Starting with a diagram, we are looking for the following area under the
curve:
For this question, we want to use the sampling distribution of the sample mean. We can
standardize our sample mean by using the formula:
Z=
Y  Y
Y
N
Notice that the standard error is in the denominator. We can calculate the standard error
by using the formula:
Y 
Y
N
The standard error of the mean of $34,000 is:
$5,000
 $353.55
200
A sample mean of $33,000 corresponds to a Z score of
Z
33,000  34,000
353.55
5
Z
 1,000
 2.83
353.55
= – 2.83
The area between the score and the mean (which is $34,000) is about .4977, which is
also the probability of a mean between $33,000 and $34,000.
6. What is the probability that the sample mean exceeds $37,000?
For this question, we want to find the area beyond the Z score. We start with a
picture:
Z

37,000  34,000
 8.48
353.55
3,000
353.55
= 8.48
This value is so large that it is not represented in Appendix B, which means that the
probability is essentially zero for all intents and purposes.
6
Use the data from Table 1 (on page 1) to complete questions 7 – 9.
7. Construct the 95 percent confidence interval for the mean number of years of
education for lower class and middle class respondents. Interpret the results.
The general formula for a confidence interval is: CI = Y  Z(  Y )
The formula tells us to take the sample mean and subtract from it and add to it the
quantity of the product between a Z score and the standard error. Since we are not
given the population standard deviation, we must estimate it with the sample standard
deviation. For a 95 percent confidence interval, we choose a Z score of 1.96.
For the lower class respondents:
The standard error is equal to:
SY 
SY
N

3.18
89
 0.337
The confidence interval is equal to:
Confidence Interval
= 10.34  1.96(0.337)
= 10.34  0.661
= 9.68 to 11.00
We can interpret this by saying that we are 95 percent confident that the true mean is
no less than 9.68 and no greater than 11.
For middle class respondents:
SY 
SY
N

2.391
675
Confidence Interval
 0.092
= 14.19  1.96(0.092)
= 14.19  0.18
= 14.01 to 14.37
We can interpret this by saying that we are 95 percent confident that the true mean is
no less than 14.01 and no greater than 14.37.
7
8. Construct the 99 percent confidence interval for the mean number of years of
education for lower class and middle class respondents. Interpret the results.
For lower class respondents:
SY 
SY
N

3.18
89
 0.337
Confidence Interval
= 10.34  2.58(0.337)
= 10.34  0.869
= 9.47 to 11.21
We can interpret this by saying that we are 99 percent confident that the true mean is
no less than 9.47 and no greater than 11.21.
For middle class respondents:
SY 
SY
N

2.391
675
Confidence Interval
 0.092
= 14.19  2.58(0.092)
= 14.19  0.237
= 13.95 to 14.43
We can interpret this by saying that we are 99 percent confident that the true mean is
no less than 13.95 and no greater than 14.43.
9. As the confidence level increases, what happens to the size of the confidence
interval? How does the confidence interval affect the precision of the estimate?
As the confidence level rises, so does the width of the confidence interval. As the
width of the confidence interval increases, the precision of the estimate decreases.
8
10. It is known that, nationally, doctors working for Heath Maintenance Organizations
(HMOs) average 13.5 years of experience in their specialties, with a standard
deviation of 7.6 years. The executive director of an HMO in a western state is
interested in determining whether or not its doctors have less experience than the
national average. A random sample of 150 doctors from the HMO shows a mean of
only 10.9 years of experience. Test the hypothesis that doctors in this HMO have less
experience than the national average. Use an alpha level of .01. Make certain to
follow the five steps in hypothesis testing.
This question involves a test of a single sample mean and the population. It is a onesided test.
1) the first step in hypothesis testing is to state assumptions
We assume:
1. A random sample was used
2. The variable years of experience is measured on an interval-ratio level
3. Because N > 50, the assumption of normal population is not required
2) Second, we state the research and null hypothesis and the selected alpha level
We want to test the hypothesis that doctors in this HMO have less experience
than the national average, so this is a one-sided test:
H1: Y < 13.5 years
H0: Y = 13.5 years
We choose an alpha of 0.01
3) Third, we select the sampling distribution and specify the test statistic
We are given the population standard deviation, so we do not need to estimate
it with the sample standard deviation. Hence we can use the Z distribution.
The formula for the Z statistic is:
Z=
Y  Y
Y
N
4) Now we compute the Z statistic:
We plug the numbers we are given into the formula:
9
Z=
10.9  13.5
7 .6
150
 2.6
=
7.6
12.25
 2.6
=
0.62
= – 4.19
5) Now we make a decision and interpret the results
Drawing a picture will make it easier to make a decision:
The Z value obtained is –4.19. The p value for a Z of –4.19 is less than .001
for a one-tailed test. This is less than the alpha level of .01, and so the P value
is less than the alpha level. We reject the null hypothesis. We have evidence in
favor of the research hypothesis, and we conclude that the doctors at the HMO
do have less experience than the population of doctors at all HMOs.
10
11. The 2000 International Social Survey Programme (ISSP) collected data on the
educational attainment of males and females. Based on a random sample of 618
cases, males were found to have an average of 11.85 years of education with a
standard deviation of 3.98 years. A random sample of 732 females found an average
of 11.34 years of education with a standard deviation of 3.74 years. Using a .05 alpha
level, test whether there is a significant difference in educational attainment between
men and women.
This question involves a test of the difference between two sample means. It is a twosided test.
1) Assumptions:
1. Independent random samples are used.
2. The variable years of education is measured at an interval-ratio level of
measurement.
3. Because N1 > 50 and N2 > 50, the assumption of normal population is not
required
4. The population variances are assumed equal
2) Research and null hypotheses and alpha level
H1: 1  2
H0: 1 = 2
 = .05
3) Sampling Distribution and Test Statistic
Because we use the standard deviation to calculate the standard error of the
sampling distribution, we use the t distribution
t=
Y1  Y2
S Y1 Y2
SY1 Y2 
( N1  1) SY21  ( N 2  1) SY22
( N1  N 2 )  2
df = (N1 + N2) – 2
11
N1  N 2
N1 N 2
4) Computing the test statistic
df  618  732  2  1348
S Y 1 Y 
2
(618  1)3.98 2  732  13.74 2
(618  732)  2
= (617)15.8404  73113.9876
(1,350)  2
618  732
618(732)
1,350
452,376
= 9,773.527  10,224.94 0.002984
(1,348)
= 19,998.46  (0.054628)
1,348
= 14.83565  (0.054628)
= 3.851708  0.054628
= .21
11.85  11.34
.21
.
51
= t
.21
t
= 2.43
5) Make Decision, interpret results
We start with a diagram:
Based on t obtained of 2.43, we can reject the null hypothesis. The
probability of 2.43 lies between .02 and .01 for a two-tailed test, which is
less than the alpha of .051. Based on the ISSP dataset, we conclude that
1
We do not need to divide the alpha by two because the p value for the two-sided test is given in the table
12
there is a relationship between gender and educational attainment. Men
have an average of .51 more years of education than women.
13
Useful Formulas
Z=
Y Y
SY
Y 
Y 
Y
M
Y
N
CI = Y  Z(  Y )
Z=
Y  Y
Y
N
t=
Y  Y
SY
N
df = N – 1
t=
Y1  Y2
S Y1 Y2
SY1 Y2 
( N1  1) SY21  ( N 2  1) SY22
( N1  N 2 )  2
df = (N1 + N2) – 2
N1  N 2
N1 N 2
14