Download Mathematical Template

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Confidence interval wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
COB 191
Statistical Methods
Exam 3
Fall 2007
Name______________________________
Section (circle): 1:25 PM 2:30 PM
Dr. Scott Stevens
DO NOT TURN TO THE NEXT PAGE UNTIL YOU ARE INSTRUCTED TO DO SO!
The following exam consists of 40 questions, each worth 2.5 points. You will have 90
minutes to complete the test. This means that you have, on average, 2 minutes, 15
seconds per question.
Record your answer to each question on the scantron sheet provided. You are welcome
to write on this exam, but your scantron will record your graded answer.
Read carefully, and check your answers. Don’t let yourself write nonsense.
Keep your eyes on your own paper. If you believe that someone sitting near you is
cheating, raise your hand and quietly inform me of this. I'll keep an eye peeled, and
your anonymity will be respected.
If any question seems unclear or ambiguous to you, raise your hand, and I will attempt to
clarify it.
Be sure your correctly record your student number on your scantron, and blacken in the
corresponding digits. Failure to do so will cost you 10 points on this exam!
Pledge:
On my honor as a JMU student, I pledge that I have neither given nor received
unauthorized assistance on this examination.
Signature ______________________________________
Possibly Useful Excel Functions
=NORMDIST(x value, mean, st.dev, TRUE)
=NORMINV(probability, mean, st. dev)
=TDIST(t value, deg. freedom, # of tails)
NORMINV and NORMDIST deal
w ith WHITE area
=NORMSDIST(z value)
=NORMSINV(probability)
=TINV(probability, deg. freedom)
TINV and TDIST deal w ith
WHITE area
Questions 1 - 5 deal with the Virginia Construction scenario, below.
One hundred state-funded building projects in the state of Virginia were randomly
selected, and the construction cost of each was recorded. The mean construction
cost for these 100 projects was $850,000 with a standard deviation of $60,000.
1. Given this information, we would expect that the population standard deviation, σ
a)
b)
c)
d)
e)
should be about $600.
should be about $6,000.
should be about $60,000.
should be about $600,000.
must be exactly $60,000.
2. Given this information, we could determine the 95% confidence interval for
I. the average cost of a state-funded building project in Virginia.
II. the fraction of state-funded building projects in Virginia that cost more
than $850,000.
III. the fraction of state-funded building projects in Virginia that cost more
than $910,000.
a) I only
b) II only
c) III only
d) I and II only e) I, II and III
3. Suppose that we use the given information to test (at the 10% level of
significance) the null hypothesis that the average cost of a state-funded building
project in Virginia was less than or equal to $900,000. This test would be a
a) one-tailed z test b) one-tailed t test c) two-tailed z test
d) two-tailed t test e) paired difference of means test
4. Suppose that we use the given information to test (at the 10% level of
significance) the null hypothesis that the average cost of a state-funded building
project in Virginia was less than or equal to $900,000. The conclusion of this test
would be
a)
b)
c)
d)
reject the null hypothesis
accept the null hypothesis
do not reject the null hypothesis.
do not accept the null hypothesis.
5. Suppose that we use the given information to test (at the 10% level of
significance) the null hypothesis that the average cost of a state-funded building
project in Virginia was less than or equal to $900,000. Which expression below
would give the value of α appropriate for this test?
a) =tdist(50000, 99, 1)
d) =normsdist(-0.8333)
b) =tdist(8.333,99,1) c) =normsdist(-8.333)
e) 0.1
End of Virginia Construction Scenario
Questions 6 - 17 deal with the Spell Check study described below.
A randomly selected group of 300 registered voters were asked to identify the party with
which they were registered. 160 said that they were registered Republicans, 120 said that
they were registered Democrats, and the remaining 20 said that they were registered either
Independent, or with some other party. Each person was then asked to take a short
dictation, writing down this sentence on a piece of paper:
“It’s unbelievable that you’re having a lot of arguments about grammar.”
The transcriptions were then checked for spelling errors. The results are shown below in a
crosstab table. For example, 56 of the respondents were Republicans who made at least
one spelling error in transcribing the sentence.
No spelling errors
104
84
18
Republican
Democrat
Other
At least one spelling error
56
36
2
In these samples, the mean number of errors made by Republicans was 0.40 with a
standard deviation of 0.60. The mean number of errors made by Democrats was 0.85 with
a standard deviation of 1.50. For “Others”, the figures were 0.10 for the mean and 0.30
for the standard deviation.
Questions 6 to 8 deal only with the Democrats.
6. Suppose we focus for the moment only on Democrats. Then the calculation
36/120 would give the value of
a) π

b) x
c) μ
d) p
e) 

x
7. A study of the populace at large has shown that 33% of all adults make at least
one spelling error when transcribing the statement. We would like to determine if
there is evidence that registered Democrats have a lower error rate than the
populace at large. What would be the appropriate null hypothesis for this test?
a) μ < 0.33
b) p > 0.33
c) μ = 0.33
d) π < 0.33
e) π > 0.33
8. Continuing with question #7: the appropriate hypothesis test would be a
a) one population t test
c) two population t test
b) one population z test
d) two population z test e) paired differences test
9. We focus now only on the population of all registered Republicans. We’ll
conduct a hypothesis test at the 0.10 level of significance. The null hypothesis
under consideration is that at most 1/3 of all registered Republicans would
make spelling errors in their transcriptions. In our sample, 35% of the
Republicans made spelling errors. This hypothesis test has a P-value of 0.3274.
We conclude that we cannot reject the null hypothesis. Why?
a)
b)
c)
d)
e)
Because 0.10 is less than 0.3333
Because 0.3333 is less than 0.35
Because 0.3274 is greater than 0.10
Because 0.3274 is less than 0.3333
Because 0.3274 is less than 0.35
10. We focus now only on registered voters who are neither Democrat nor
Republican. We want to test the null hypothesis that spelling errors would occur
in the transcriptions of at most 5% of such “Other” voters. A few moments of
consideration, and we’ll be done with this problem. Why?
a) Because only 10% of the sample of “Other” voters made mistakes, so we
would accept the null hypothesis as true.
b) Because only 10% of the sample of “Other” voters made mistakes, so the
sample is in the “nonrejectable tail”.
c) Because the sample size (20) is too small to use unless we know that the
population is nearly normal.
d) Because the number of successes in the sample (2) is less than 5.
e) Because the expected number of successes in a sample of size (1) is less than
5.
11. Suppose that I believe that the average number of mistakes made in transcription
is greater for registered Democrats than it is for registered Republicans. I want to
use the data provided to conduct the appropriate hypothesis test at the 5% level of
significance. What kind of test is appropriate?
a)
b)
c)
d)
e)
One-tailed, two population test of means
Two-tailed, two population test of means
One-tailed, two population test of proportions
Two tailed, two population test of proportions
One-tailed paired difference test of means
12. The distribution of the number of errors made by Democrats in the sample is
shown below. You wish to build the 90% confidence interval for the average
number of spelling errors that would be made by registered Democratic voters on
this transcription test. Which line of thought is most relevant for beginning this
task?
a) “The population is not normal, so the test cannot be conducted.”
b) “There are at least five people who made errors and at least five people who
did not, so I can proceed.”
c) “The sample is quite skewed, so I cannot construct the interval.”
d) “The sample is quite skewed but my sample size is large, so I can still
construct the interval.”
e) “The normality of the population is irrelevant to me because I am building a
confidence interval for a proportion.”
13.
Frequency
Number of Errors Made By Democratic Respondents in
Sample
90
80
70
60
50
40
30
20
10
0
0
1
2
3
4
errors per respondent
5
13. I conduct a two-tailed hypothesis test at the 0.50 level of significance. (Note that
this is an uncommonly high level of significance!) The null hypothesis is that the
same proportion of Republicans as Democrats make spelling errors on the
transcription. The P-value for this test is 0.378. At the 0.50 level of significance,
what do I conclude? (Be sure to read all of the answers!)
a) There is no difference between the fraction of registered Republicans and
registered Democrats who would make spelling errors on the test.
b) There is no reason to conclude that there is a difference between the fraction
of registered Republicans and registered Democrats who would make spelling
errors on the test.
c) Registered Democrats are more likely to make spelling errors on the test than
registered Republicans.
d) 37.8% of registered Republicans and Democrats would make spelling errors
on the test.
e) The fraction of registered Democrats who would make spelling errors on the
test is different than the fraction of registered Republicans who would make a
spelling error on the test.
14. By using such a very large level of significance in problem #13, I made myself
very vulnerable to a particular kind of error. Which of the following correctly
expands this point?
a) I’m very likely to reject a null hypothesis that is in fact true.
b) I’m very likely to fail to reject a null hypothesis that is actually false.
c) I’m very likely to use a larger sample size than is actually needed to conduct
the test.
d) I’m very likely to reject a null hypothesis that is actually false.
e) I’m very likely to accept a null hypothesis, even though doing so is always a
mistake.
15. Based on the data given, I want to construct the 80% confidence interval for the
fraction of registered Republicans who would make a spelling error (or several)
on the dictation test. As a first step, compute p. Your answer must be accurate
to at least 4 decimal places.
a) 0.0373
b) 0.0377
c) 0.3274
d) 0.3333
e) 0.3356
16. Continuing with #15: The Excel command that would allow you to find the
appropriate critical score for the computation of this interval is
a) =TDIST(0.20,159,2)
d) =NORMSDIST(0.80)
b) =TINV(0.20,159) c) =TINV(0.40,159)
e) =NORMSINV(0.90)
17. The answer to question #16, numerically, is 1.28. Pretend for this question only
that the answer to question 15 was 0.10. Find the lower limit of the 80%
confidence interval for the fraction of registered Republicans who would make a
spelling error (or several) on the dictation test.
a) 0.205
b) 0.222
c) 0.301
d) 0.461
e) 0.478
End of the Spell Check Study Scenario
18. Based on a sample of 400 randomly selected new US houses, we determine a 90%
confidence interval for the mean price of a new US house as $260,000 to
$280,000. This means that
a) we are 90% confident that the mean price for our sample is between $260,000
and $280,000.
b) we are 90% confident that the mean price of all new US houses is between
$260,000 and $280,000.
c) about 90% of our sample had prices between $260,000 to $280,000.
d) about 90% of all new US houses have prices between $260,000 and $280,000.
e) if we looked at all possible samples of 400 randomly selected new US houses,
about 90% of those samples would have an average price between $260,000
and $280,000.
Questions 19 - 21 are based on the Drive to Work scenario, described below.
I time my drive from home to work on 50 different days, and find that the average
time is 16.25 minutes. Assuming that these times constitute a random sample drawn
from a population whose standard deviation of 3.50 minutes, I wish to construct the
96% confidence interval for the mean time required for me to get to work.
19. What symbol would be used to represent the value 3.50 in this problem?

a) s
b) 
c)  x
d) s2
e) 2
20. The critical value (z* or t*) needed for the construction of this confidence interval
could be computed in Excel by the command
a)
b)
c)
d)
e)
=NORMSINV(0.92)
=NORMSINV(0.96)
=NORMSINV(0.98)
=TINV(0.04,49)
=TINV(0.04,50)
21. The margin of error (MOE) for the confidence interval would be computed by
multiplying your answer to question number 20 by
a) 0.0700
b) 0.2450
c) 0.4950
d) 0.500
e)1.732
End of the Drive to Work Scenario
Questions 22-24 deal with the Radiation Therapy scenario below.
The radiation machine in a cancer ward is periodically checked to be sure that the
level of radiation indicated on its control panel is close to what is actually
delivered to the patient. This is important since too low a radiation dose may not
kill the cancer, while too high a radiation dose may actually cause severe damage
to the patient.
To conduct this test, a special sensor is attached to the radiation machine for a
time. Each time the machine is used, the sensor can be activated to record both
the actual radiation level of the treatment and the radiation level indicated on the
control panel. The cost of gathering and recording this information is 1 cent per
use recorded. After a month during which data may be collected, a hypothesis
test will be conducted using the sensor data. The null hypothesis is that the
average level of radiation indicated on the control panel is the same as the
average level actually delivered. If this hypothesis is rejected, a repair team
costing $100 will be called in to adjust the machine. Note that the actual number
of times the machine is used during the month will be the same, regardless of
how many times the sensor is activated.
22. Given the consequences discussed above, the hypothesis test should be conducted
with
a)
b)
c)
d)
e)
a small sample and small value of alpha
a small sample and a large value of alpha
a large sample and a small value of alpha
a large sample and a large value of alpha
a small value of alpha and a large value of beta
23. Suppose the hospital administration reviews the test results and concludes that
there is no reason to believe the machine is malfunctioning. Later, they find that
the machine was giving twice the radiation, on average, that was indicated by the
controls. This would be an example of
a) careless extrapolation
b) a skewed population
d) type II error
e) type III error
c) type I error
24. The sensor records the actual dosage and the reported dosage for each patient
receiving treatment. The best way to conduct the hypothesis test from this data is
a)
b)
c)
d)
e)
as the difference of two means with variances assumed equal.
as the difference of two means with variances not assumed equal.
as a paired difference, resulting in a test of a single mean.
as a difference of two proportions test.
as a blind stab in the dark.
End of Radiation Therapy Scenario
Questions 25 - 27 deal with the Christmas Tree scenario, below.
George plans on harvesting the evergreens in a particular acre of his land to be
sold as Christmas trees. He has been told that there is virtually no market for trees
shorter than 5 feet or taller than 8 feet. Trees between 5 and 8 feet in height can be
sold.
George wants to make sure that his trees can be sold, so he has his daughter take a
random sample of 100 of his evergreens and measure their heights. Using the
heights of these 100 trees, she (correctly) builds the 95% confidence interval for
the mean tree height. It runs from 6 feet, 4 inches to 7 feet, 4 inches. As a
consequence, George orders the evergreens on his property harvested for sale as
Christmas trees.
25. What was the average height of the 100 trees in the sample?
a)
b)
c)
d)
e)
6 feet, 4 inches
6 feet, 10 inches
7 feet, 4 inches
definitely between 6’4” and 7’4”, but we can’t be more precise than that
impossible to say from the information given
26. What was the average height of all of the evergreens on the acre of property?
a)
b)
c)
d)
e)
6 feet, 4 inches
6 feet, 10 inches
7 feet, 4 inches
definitely between 6’4” and 7’4”, but we can’t be more precise than that
impossible to say from the information given
27. When George has his trees harvested, he discovers that about 3/4 of his trees are
either smaller than 5’ or taller than 8’. In light of the confidence interval work, he
is shocked. He thought that almost all of his evergreens would be in the range of
trees that could be sold. What important factor is the most sensible explanation of
this “discrepancy”?
a) There is a 5% chance that the confidence interval doesn’t contain the
population mean. Evidently, his daughter randomly chose a sample that is not
representative of the population.
b) The heights of evergreen trees on the acre might not be normally distributed.
c) George’s daughter may have mismeasured the heights of some of the trees in
the sample.
d) The confidence interval deals only with the average height of evergreens on
the property, not the heights of individual trees.
e) George doesn’t know how many evergreens were on his acre of land.
End of Christmas Tree Scenario
Questions 28 - 30 deal with the Cholesterol scenario below.
For the general population of men, the mean cholesterol level is 215. It has been noted
by some physicians that men who have creased earlobes tend to have higher cholesterol
levels than the population of men at large. To explore this theory, we take as our null
hypothesis H0: crease < 215, where crease is the average cholesterol level for men with
creased earlobes. We take a random sample of 20 men who have creased earlobes and
measure their cholesterol levels. We find that the average cholesterol level for these men
is 257, with a standard deviation of 10. We’ll assume that the distribution of cholesterol
levels is normally distributed.
28. Find the t value for this sample. It is
a) 4.2
b) 18.78
c) 25.7
d) 84
e) 114.9
29. Let t be the value for t found in question 28. To find the P-value corresponding to
this t value in Excel, we would enter
a)
b)
c)
d)
e)
=TDIST(t, 19, 1)
=TDIST(t, 19, 2)
=TDIST(t, 20, 1)
=TDIST( t, 20, 2)
=TINV(t, 19)/2
30. The critical t value for this test, t*, depends upon
I. the value of alpha
II. the sample size
III. the sample mean
a) I only
b) II only
c) III only
d) I and II only
e) II and III only
31. The t value from this sample is larger than t*. What should we conclude?
a) There is no evidence that men with creased earlobes have a higher average
cholesterol level than men at large.
b) There is evidence that men with creased earlobes have a higher average
cholesterol level than men at large.
c) There is no evidence that men with creased earlobes have a lower average
cholesterol level than men at large.
d) There is evidence that mean with creased earlobes have a lower average
cholesterol level than men at large.
e) We cannot conclude any of these statements, because the sample’s mean is in the
nonrejectable tail of the hypothesis test.
End of the Cholesterol scenario
32. Increasing the size of alpha used for a hypothesis test will
I. Decrease the size of the nonrejection region.
II. Decrease the chance of rejecting a true null hypothesis.
III. Decrease the chance of rejecting a false null hypothesis.
a) I only
b) II only
c) I and II only
d) I and III only
e) II and III only
33. The degrees of freedom for the t statistic in a one mean hypothesis test is the same
as
a) 
b) 
c) n-1
d) / n
e) s/ n-1
34. I want to construct the 99% confidence interval for the population proportion. z*
for 99% confidence is 2.58. I have no idea of what the population proportion
might actually be, but I imagine that it’s somewhere between 0.05 and 0.95.
What’s the smallest sample that I could take to assure that the resulting 99%
confidence interval has a margin of error of no more than 0.05?
a) 26
b) 127
c) 666
d) 2663
e) 16,441
35. I want to construct the 99% confidence interval for the population mean. z* for
99% confidence is 2.58. I know that the population standard deviation is 16.
What is the smallest sample that I can take to assure that the resulting 99%
confidence interval will have a total width (from lower limit to upper limit) of no
more than 10? (You may assume that the population is normally distributed.)
a) 2
b) 9
c) 18
d) 36
e) 69
36. As the degrees of freedom of a t distribution grows larger and larger, the t
distribution
a)
b)
c)
d)
e)
becomes nearly identical to the z distribution.
becomes narrower and narrower, approaching a single “spike”.
becomes wider and wider, approaching a “flat line”.
becomes negative.
becomes undefined.
37. I conduct a one-tailed hypothesis test of the mean for a population whose standard
deviation I do not know. The rejection region for this test is in the lower tail. I
take a sample of 100 observations and find that the t value computed for my
sample is -1.1. I wish to know the P-value of this sample in this one-tailed test.
Which calculation in Excel would give me this value?
a) =TDIST(-1.1, 99,2)
d) =TDIST(-1.1,99,1)/2
b) =TDIST(1.1, 99, 2)
e) =TDIST(-1.1,99,1)*2
c) =TDIST(1.1, 99, 1)
38. I want to conduct a one-tailed hypothesis test of the mean for a population whose
standard deviation I do not know. The rejection region for this test is in the upper
tail. I take a sample of 100 observations. I want to use a 0.05 level of
significance. What calculation in Excel would give me t*, the critical value of t
for this test?
a) =TINV(0.05,99) b) =-TINV(0.05,99) c) =TINV(0.10,99)
d) =TINV(0.05,99)*2
e) =TINV(0.05,99)/2
39. A firm is trying to decide if sending its office staff to a word processing training
program is worthwhile. It feels that the program is worthwhile if the average
typing speed of its office staff members would increase by more than 5 words per
minute. To explore the possibility, the firm has 35 of its office staff go through
the training program. A typing sample is then given to 70 of the company’s
employees (the 35 who received the training and another 35 who did not), and the
typing speed of each employee is recorded.
What null hypothesis would be the best to use with the available data?
a) after - before < 0 where before is the mean typing rate before the training and after is
the mean typing rate after the training.
b) after - before < 5 where before is the mean typing rate before the training and after is
the mean typing rate after the training.
c) after - before > 0 where before is the mean typing rate before the training and after is
the mean typing rate after the training.
d) after - before > 5 where before is the mean typing rate before the training and after is
the mean typing rate after the training.
e) difference > 5, where difference is the average improvement in each member of the office
staff, where “improvement” is computed by taking a staff member’s typing rate after
the training and subtracting off his or her typing rate before training.
40. Estimation and hypothesis testing are examples of
a) summary statistics
b) linear methods
c) descriptive statistics
d) seat-of-the-pants methods
e) inferential statistics