Download Final Exam

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Least squares wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
EDRS 811 Final Sp. 12 1
EDRS 811 Final Exam
Spring, 2011, Brigham
The exam is an open book, open notes activity. You must work individually to complete this
task, however. Consultation with another person, living, dead, or imagined is a violation of
the honor code and could cause a failing grade for the final. You are free to ask questions of
clarification but I cannot tell you how to do things at this point. This is as much an
evaluation of my teaching as your learning. Try to make me feel good about myself!
You may need to use Excel or a powerful calculator for some of the questions. You are
permitted to use SPSS but you should not really need to do so.
The exam is due by midnight on May 15. Earlier submissions are gratefully accepted! I will
place a slot on the Blackboard web site for you to upload your files so that you will be
certain that I have received them. Also, a copy of this document will be placed on
Blackboard for your use.
If you type your responses on this document, please put them in bold or some other form
that will make it easy for me to discriminate your work from mine.
1.
A study is conducted on students taking a statistics class. Several variables are recorded in
the survey. Identify each variable as categorical or quantitative.
A) Type of car the student owns. Categorical
B) Number of credit hours taken during that semester. Quantitative
C) The time the student waited in line at the bookstore to pay for his/her textbooks.
Quantitative
D) Home state of the student. Categorical
2.
A sample of employees of a large pharmaceutical company has been obtained. The length
of time (in months) they have worked for the company was recorded for each employee. A
stemplot of these data is shown below. In the stemplot 6|2 represents
62 months.
What would be a better way to represent this data set?
A) Display the data in a time plot.
B) Display the data in a boxplot.
C) Split the stems.
D) Use a histogram with class width equal to 10
EDRS 811 Final Sp. 12 2
3.
4.
5.
When examining a distribution of a quantitative variable, which of the following features
do we look for?
A) Overall shape, center, and spread.
B) Symmetry or skewness.
C) Deviations from overall patterns such as outliers.
D) The number of peaks or modes.
E) All of the above.
Select the three most important elements from item three, list them in decreasing
order (most second most least) and explain why you believe that item is important
(e.g., how does inspection of that element help us in our statistical interpretation of
the data?
1) overall shape, center, and spread – gives you the best overall description of
the distribution of the data, with the mean, median and spread each giving
important information
2) symmetry or skewness – skew will affect the mean, and this tells you which
way the data is trending
3) deviations from overall pattern – an outlier will affect the mean, this helps
you interpret any differences between the median and mode
Statistics were gathered on the number of homicides committed with guns in Australia in
the years from 1980 to 2004. From these data the following graph was constructed:
This plot is a graph of a(n) _________ and it shows that there is(are) _______ in the data.
A) categorical variable; skewness to the right
B) histogram; multiple peaks
C) line; an increasing trend
EDRS 811 Final Sp. 12 3
D) quantitative variable; outlier values
E) time series; a decreasing trend
6.
The time to complete an exam is approximately Normal with a mean of 70 minutes and a
standard deviation of 10 minutes. Using the 68-95-99.7 rule, what percentage of students
will complete the exam in under an hour?
A) 68%
B) 32%
C) 16%
D) 5%
7.
Using the standard Normal distribution tables, what is the area under the standard Normal
curve corresponding to Z < 1.1?
A) 0.1357
B) 0.2704
C) 0.8413
D) 0.8643
8.
Using the standard Normal distribution tables, what is the area under the standard Normal
curve corresponding to Z > –1.22?
A) 0.1151
B) 0.1112
C) 0.8849
D) 0.8888
9.
Using the standard Normal distribution tables, what is the area under the standard Normal
curve corresponding to –0.5 < Z < 1.2?
A) 0.3085
B) 0.8849
C) 0.5764
D) 0.2815
10.
The variable Z has a standard Normal distribution. Find the value z such that 85% of the
observations fall below z.
A) z = –1.04
B) z = 0.80
C) z = 0.85
D) z = 1.04
EDRS 811 Final Sp. 12 4
11.
Consider the following Normal quantile plot:
What is the most striking feature of the plot?
A) The granularity.
B) The strong skewness indicated by the plot.
C) The many outliers evident in the plot.
D) The fact that Y is categorical.
Use the following to answer questions 12 and 13:
John’s parents recorded his height at various ages between 36 and 66 months. Below is a
record of the results:
Age (months)
Height (inches)
36
34
48
38
54
41
60
43
66
45
12.
Which of the following is the equation of the least-squares regression line of John’s
height on age? (Note: You do not need to directly calculate the least-squares regression
line to answer this question.)
A)
Height = 12  (Age)
B)
Height = Age/12
C)
Height = 60 – 0.22  (Age)
D)
Height = 22.3 + 0.34  (Age)
13.
John’s parents decide to use the least-squares regression line of John’s height on
age to predict his height at age 21 years (252 months). What conclusion can we draw?
A)
John’s height, in inches, should be about half his age, in months.
B)
The parents will get a fairly accurate estimate of his height at age 21 years,
because the data are clearly correlated.
C)
Such a prediction could be misleading, because it involves extrapolation.
D)
All of the above.
EDRS 811 Final Sp. 12 5
14.
Which of the following scatterplots would indicate that y is growing linearly over
time?
A)
20.0
Y
15.0
10.0
5.0
0.0
0
2
4
6
8
10
Time
B)
5.0
4.5
Y
4.0
3.5
3.0
2.5
2.0
0
2
4
6
8
10
Time
C)
12.0
10.0
Y
8.0
6.0
4.0
2.0
0.0
0
2
4
6
8
10
8
10
Time
D)
120
100
Y
80
60
40
20
0
0
2
4
6
Time
EDRS 811 Final Sp. 12 6
Use the following information for items 15& 16
Are avid readers more likely to wear glasses than those who read less frequently? Threehundred men in Ohio were selected at random and characterized as to whether they wore
glasses and whether the amount of reading they did was above average, average, or below
average. The results are presented in the following table:
Amount of reading
Above average
Average
Below average
Total
15.
Glasses?
Yes
No
47
26
48
78
31
70
126
174
What is the proportion of men in the sample who wear glasses?
0.42
16.
What is the proportion of all above average readers who wear glasses?
0.644
Use the following information for items 17-19
The scores of individual students on the American College Testing (ACT) Program
Composite College Entrance Examination have a Normal distribution with mean 18.6 and
standard deviation 6.0. At Northside High, 36 seniors take the test. Assume the scores at
this school have the same distribution as national scores.
17.
What is the mean of the sampling distribution of the sample mean score for a
random sample of 36 students?
18
18.
What is the mean of the sampling distribution of the sample mean score for a
random sample of 36 students?
18
19.
What is the sampling distribution of the sample mean score for a random sample of
36 students?
A)
Approximately Normal, but the approximation is poor.
B)
Approximately Normal, and the approximation is good.
C)
Exactly Normal.
D)
Neither Normal nor non-Normal. It depends on the particular 36 students
selected.
EDRS 811 Final Sp. 12 7
20.
A small New England college has a total of 400 students. The Math SAT is required
for admission, and the mean score of all 400 students is 620. The population
standard deviation is found to be 60. The formula for a 95% confidence interval
yields the interval 640 ± 5.88. Determine whether each of the following statements
is true or false.
A)
If we repeated this procedure many, many times, only 5% of the 95%
confidence intervals would fail to include the mean Math SAT score of the
population of all students at this college. FALSE
B)
The probability that the population mean will fall between 634.12 and
645.88 is 0.95. TRUE
C)
The interval is incorrect. It is much too narrow. FALSE
D)
If we repeated this procedure many, many times, x would fall between
634.12 and 645.88 about 95% of the time. TRUE
21.
The scores on the Wechsler Intelligence Scale for Children (WISC) are thought to be
Normally distributed with a standard deviation of  = 10. A simple random sample
of 25 children is taken, and each is given the WISC. The mean of the 25 scores is x =
104.32. Based on these data, what is a 95% confidence interval for ?
A)
104.32 ± 0.78
B)
104.32 ± 3.29
C)
104.32 ± 3.92
D)
104.32 ± 19.60
22.
The larger the level of confidence, C, the ______ the confidence interval.
A)
smaller
B)
larger
C)
None of the above.
23.
When we state the alternative hypothesis to look for a difference in a parameter in
either direction, we are doing a _____.
A)
one-sided test
B)
two-sided test
C)
None of the above.
24.
Given that a test of significance was done for a two-sided test and the P-value
obtained was .02, what would be the P-value for a one-sided significance test?
A)
0.02
C) 0.01
B)
0.04
D) 0
EDRS 811 Final Sp. 12 8
25.
A simple random sample of six male patients over the age of 65 is being used in a
blood pressure study. The standard error of the mean blood pressure of these six
men was 22.8. What is the standard deviation of these six blood pressure
measurements?
A)
9.31
C) 55.85
B)
50.98
D) 136.8
26.
A simple random sample of 20 third-grade children from a certain school district is
selected, and each is given a test to measure his/her reading ability. You are
interested in calculating a 95% confidence interval for the population mean score.
In the sample, the mean score is 64 points, and the standard deviation is 12 points.
What is the margin of error associated with the confidence interval?
A)
2.68 points
B)
4.64 points
C)
5.62 points
D)
6.84 points
27.
Suppose a simple random sample size of n is drawn from an appropriately normal
population. What degrees of freedom should be used to perform a one sample t
procedure?
A)
n–1
B)
n+1
C)
n–2
D)
n+2
28.
Matched pairs t procedures are for use on subjects that are _______.
A)
independent
B)
the same or similar
C)
Normal
D)
None of the above.
29.
One sample t test procedures are for use on subjects that are _______.
A)
independent
B)
the same or similar
C)
binomial
D)
None of the above.
EDRS 811 Final Sp. 12 9
Use the following information to Answer items 30 & 31
Two statistics professors at two rival schools decide to use IQ scores as a measure of how
smart the students at their respective schools are. IQ scores are known to be Normally
distributed. The two professors will use this knowledge to their advantage. They will
randomly select 10 students from their respective schools and determine the students’ IQ
scores by means of the standard IQ test. The two professors will use the pooled version of
the two-sample t test to determine whether the students at the two universities are equally
smart. Let 1 and 2 represent the mean IQ scores of the students at the two universities.
Let 1 and 2 be the corresponding population standard deviations. The hypotheses they
will test are H0: 1 – 2 = 0 versus Ha: 1 – 2 ≠ 0. Based on the two samples of 10 students,
the two professors find the following information: x1 = 111,
x 2 = 120, s1 = 7, and s2 = 11.
30.
What is the value of the test statistic?
A)
0.107
C) –2.18
B)
–0.98
D) –3.00
31.
Suppose the professors had wished to test the hypotheses H0: 1 = 2 versus Ha: 1 < 2.
What can we say about the value of the P-value?
A)
P-value < 0.01
C)
0.025 < P-value < 0.05
B)
0.01 < P-value < 0.025
D)
P-value > 0.05
Use This information for items 32 & 33.
The 94 students in a statistics class are categorized by gender and by the year in school.
The numbers obtained are displayed below:
32.
33.
A)
B)
C)
D)
E)
Year in school
Gender
Freshman Sophomore Junior Senior Graduate Total
Male
1
2
9
17
2
31
Female
23
17
13
7
3
63
Total
24
19
22
24
5
94
Suppose we wish to test the null hypothesis that there is no association between the
year in school and gender. Under the null hypothesis, what is the expected number
of male sophomores?
A)
2
C) 6.27
B)
6
D) 9.5
It was calculated that the test statistic was X 2 = 8.083. The approximate P-value for this
test is then:
between 0.02 and 0.04.
between 0.01 and 0.02.
less than 0.01.
greater than 0.04.
between 0.15 and 0.25.
EDRS 811 Final Sp. 12 10
Use this information for items 34-36
The data referred to in this question were collected on 41 employees of a large company.
The company is trying to predict the current salary of its employees from their starting
salary (both expressed in thousands of dollars). The SPPS regression output is given below
as well as some summary measures:
34.
What is an (approximate) 95% confidence interval for the slope 1?
A) (–7.57, 4.39)
C)
(1.80, 2.41)
B) (–4.52, 1.34)
D)
(1.95, 2.26)
35.
Suppose we wish to test the hypotheses H0: 1 = 2 versus Ha: 1  2. Together with
an insignificant constant in this model, this would imply that the employees
currently earn about twice as much as their starting salary. At the 5% significance
level, would we reject the null hypothesis?
A)
Yes
B)
No
C)
This cannot be determined from the information given.
EDRS 811 Final Sp. 12 11
36.
What is the value of the estimate for , the standard deviation of the model
deviations i?
A)
0.15
C)
7.21
B)
2.93
D)
52.0
The following information refers to items 37 – 40
The following SPSS output represent data collected on 89 middle-aged people. The
relationship between body weight and percent body fat is to be studied:
37.
38.
39.
What is the equation of the least-squares regression line?
Y = -11.570 + 0.165x
What is the value of the correlation between body fat and body weight?
r = 0.5844
Let  be the population correlation between body fat and body weight. What is the
value of the t statistic for testing the hypotheses H0:  = 0 versus Ha:  ≠ 0?
t = 3.93
EDRS 811 Final Sp. 12 12
40.
Is the slope significantly different from zero? Include the value of the test statistic
and the corresponding P-value in your answer.
The test statistic t = 3.93 leads to a p-value < 0.001, which means we can
reject the null hypothesis and say that the slope is significantly different from
zero.
41.
In a multiple regression with five explanatory variables, data are collected on 63
observations. What are the degrees of freedom for the ANOVA F test?
A)
4 and 57
B)
5 and 57
C)
5 and 58
D)
5 and 62
42.
In a multiple regression with four explanatory variables, data are collected on 25
observations. What is the largest value the ANOVA F statistic can take on before we
would reject the null hypothesis that all of the regression coefficients are 0, at the
5% significance level?
A)
2.78
B)
2.87
C)
3.10
D)
3.51
Use this information for items 43 – 46.
The data referred to in this question were collected from several sales districts across the
country. The data represent sales for a maker of asphalt roofing shingles. Information on
the following variables is available:
Sales
Sales from last year in thousands of squares
Expenditures
Promotional expenditures in thousands of dollars
Accounts
Number of active accounts
Competing Brands Number of competing brands producing equivalent or similar
products
District Potential
A coded indicator of the potential of the district
(higher score = better potential)
Partial SPSS regression output of a multiple regression model with sales as the response
variable and the other four variables as predictor variables is given below:
EDRS 811 Final Sp. 12 13
43.
How many districts were sampled in all?
A)
21
C) 25
B)
24
D) 26
44.
What is the estimate for the error variance  2?
A)
9.604
C) 92.245
B)
12.960
D) 1937.137
45.
What proportion of the variation in sales is explained by the set of all four
explanatory variables?
A)
–0.647
C)
0.989
B)
0.558
D)
0.995
46.
Which of the four explanatory variables seems to be the least significant in the
model?
A)
Expenditures
C) Competing Brands
B)
Accounts
D) District Potential
46.
An F test for the two coefficients of promotional expenditures and district potential
is performed. The hypotheses are: H0: 1 = 4 = 0 versus Ha: at least one of the j is
not 0. The F statistic for this test is 1.482 with 2 and 21 degrees of freedom. What
can we say about the P-value for this test?
A)
P-value < 0.025
C)
0.05 <P-value < 0.10
B)
0.025 <P-value < 0.05
D)
P-value > 0.10
47.
A study compares five groups with 10 observations in each group. An F statistic of
3.75 is reported. What are the degrees of freedom for this F statistic?
A)
4 and 45
C)
5 and 10
B)
4 and 46
D)
5 and 50
EDRS 811 Final Sp. 12 14
48.
A study compares three population means. Three independent samples with 15
observations each are taken. The SSE = 1246 and the SST = 1600. What is the value
of the F statistic?
A)
1.11
C)
4.98
B)
3.32
D)
5.97
Use the following information for items 49 – 51
A study was conducted to compare five different training programs for improving
endurance. Forty subjects were randomly divided into five groups of eight subjects in each
group. A different training program was assigned to each group. After two months, the
improvement in endurance was recorded for each subject. A one-way ANOVA is used to
compare the five training programs, and the resulting F statistic is 3.69.
49.
What can we say about the P-value for this F test?
A)
P-value < 0.01
B)
0.01 < P-value < 0.025
C)
0.025 < P-value < 0.05
D)
0.05 < P-value < 0.10
50.
Which distribution was used to find the P-value?
A)
F(4, 3)
B)
F(5, 8)
C)
F(4, 35)
D)
t(39)
51.
At a significance level of 0.05, what is the appropriate conclusion about mean
improvement in endurance?
A)
The average amount of improvement appears to be the same for all five
training programs.
B)
The average amount of improvement appears to be different for each of the
five training programs.
C)
It appears that at least one of the five training programs has a different
average amount of improvement.
D)
One training program is significantly better than the other four.
EDRS 811 Final Sp. 12 15
52.
SPSS output for multiple comparisons is given below, using the Bonferroni method
with  = 0.05:
(I)
Machine
number
1
2
3
4
(J)
Machine
number
2
3
4
1
3
4
1
2
4
1
2
3
Mean
difference
(I–J)
-15.75*
.23
3.08
15.75*
15.98*
18.83*
-.23
-15.98*
2.85
-3.08
-18.83
-2.85
95% Confidence
interval
Lower
Upper
bound
bound
-25.37
-6.13
-9.38
9.85
-6.54
12.70
6.13
25.37
6.36
25.60
9.21
28.44
-9.85
9.38
-25.60
-6.36
-6.77
12.46
-12.70
6.54
-28.44
-9.21
-12.46
6.77
Std.
error
3.52
3.52
3.52
3.52
3.52
3.52
3.52
3.52
3.52
3.52
3.52
3.52
What is the correct conclusion based on these comparisons?
A)
Machine 1 seems to give different results from Machine 2. Machines 3 and 4 appear
indistinguishable.
B)
Machine 2 seems to give different results from all other machines. Machines 1,
3, and 4 appear indistinguishable.
C)
Machine 2 seems to be doing much better than the other three machines.
D)
None of the above.
Use the following information to complete items 53-55.
Consider the following groups in the order listed on the table:
53.
Complete the contrast coefficients to
form the contrast for the group “Human young
vs. Human old.”
A)
1, 1, -1, -1, 0
B)
0, 1, 1, 1, 1
C)
1, 1, 1, 1, 0
Girls
Boys
Women
Men
Cats
54.
Complete the contrast coefficients to form the contrast for the group “Human male
vs. Human female.”
A)
1, 1, 1, 1, 0
C) 1, 1, -1, -1, 0
B)
-1, -1, -1, 0
D) -1, 1, -1, 1, 0
55.
Complete the contrast coefficients to form the contrast for the group “Cats vs.
Human”
A)
-1, -1, -1, -1, -4
C) 1, 1, 1, 1, 4
B)
-1, -1, -1, -1, 4
D) 4, 4, 4, 4, 4
EDRS 811 Final Sp. 12 16
56.
By doing multiple comparisons when there are more than two experimental groups, we
increase the risk of making what kind of mistake?
A)
Accepting H0
B)
Type I error
C)
Type II error
D)
All of the above. (hint, this is incorrect!)
Use the following to answer questions 57 -60:
A realtor wishes to assess whether a difference exists between home prices in three
subdivisions. Independent samples of homes from each of the three subdivisions are
obtained and their prices are recorded. The analysis of variance results for comparing
these prices are provided below:
Source
Groups
Error
Total
Sum of squares
157.44
253.50
410.94
DF
2
15
17
Mean square
78.72
16.90
F
4.66
57.
How many homes were sampled in total?
A)
15
C)
18
B)
17
D)
19
58.
Under the null hypothesis of equality of population means, what is the appropriate
distribution for the test statistic?
A)
F(2, 15)
C)
N(, )
B)
F(2, 17)
D)
t(15)
59.
What is the value of the estimate of the common population standard deviation of
the populations of home prices in the three subdivisions?
A)
4.11
C)
16.90
B)
8.87
D)
78.72
60.
What can we say about the P-value for this F test?
A)
P-value < 0.01
C)
0.025 < P-value < 0.05
B)
0.01 < P-value < 0.025
D)
0.05 < P-value < 0.10
Congratulations! You have completed EDRS 811. Thank you for your hard work on this
very abstract material. Go buy yourself something ridiculous to commemorate the event!