Download Test - FloridaMAO

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
March Statewide Invitational
Statistics Individual
Important Instructions for this Test: Round all final answers to the appropriate decimal place as indicated by the
answer choices. Round any intermediate steps as indicated or as necessary to make the final answer as accurate as
possible. For example, if answer choices are rounded to hundredths place, then round your final answer to hundredths
place and choose the closest option. Good luck, and as always: “NOTA” stands for “None of These Answers is correct.”
Use the following information to answer questions 1 – 9.
Mr. Callow wishes to investigate the relationship between gender and AP Literature exam score among the seniors at his
small high school of only 160 students, exactly half of which are female and half are male. He uses the 20 senior students
from his current AP Literature class, which has exactly 10 boys and 10 girls, and coincidentally, the ten boys and ten girls
also happen to be couples that have been dating since freshman year. The results are summarized in the table below.
Couple
Girl
Boy
Difference (Girl – Boy)
A
5
1
4
B
5
1
4
C
5
1
4
D
5
1
4
E
5
1
4
F
4
1
3
G
4
2
2
H
4
2
2
I
4
2
2
J
1
5
-4
Mean
1.7
Standard Deviation
1.229273
2.460804
Maximum
5
5
4
Q3
5
4
Median
1
Q1
4
1
2
Minimum
1
1
-4
1. Fill in the six missing values in the summary statistics section of the table rounding the missing sample standard
deviation to six decimal places. What is the sum of these six missing values?
A) 17.951666
B) 17.887434
C) 15.951666
D) 15.887434
E) NOTA
2. As you may have noticed by now, most statistics competition questions involving inference procedures instruct you to
assume all necessary assumptions and conditions for performing the procedure are met. However, in real practice it is
essential to diligently verify them; otherwise, the procedure cannot be performed. With that said, how many of the
following require verification before performing an independent two-sample t-test to compare two group means?
i.
ii.
iii.
iv.
Either random sampling from the populations of interest or random assignment to treatments in a welldesigned experiment produces the data in the two samples.
Either both samples show evidence of coming from normally distributed populations (in particular, no
outliers or major skewness in the data) or both sample sizes are greater than 30 for the Central Limit
Theorem (CLT) to apply.
It is reasonable to assume that both samples and/or treatment groups are independent of each other and
that all data observations within each sample and/or treatment group are independent of each other.
Neither sample exceeds 10% of their respective population when sampling without replacement.
A) Exactly one
B) Exactly two
C) Exactly three
D) All four
E) NOTA
3. What sampling method did Mr. Callow use to select the subjects in his study?
A) Simple random sample
B) Stratified random sample
C) Cluster sample
D) Convenience sample
1
E) NOTA
March Statewide Invitational
Statistics Individual
4. Determine the outliers in each column among the scores for the girls, boys, and differences in scores using the 1.5 IQR
rule. Convert each of these outliers to a Z-score using the sample mean and sample standard deviation in the respective
column for each outlier. What is the sample mean of these Z-scores? Round any intermediate steps to six decimal places.
A) 0.0000
B) -0.8694
C) 0.21429
D) -0.8218
E) NOTA
5. What is the most appropriate inference procedure for Mr. Callow’s data in order to investigate the relationship between
gender and AP Literature test score among the seniors at his school?
A) Matched-pairs t-test
B) Independent two-sample t-test
C) Chi-square test of homogeneity
D) Linear regression t-test
E) NOTA
6. What is the sum of the test statistic, two-tail p-value, and degrees of freedom for the appropriate inference procedure?
A) 12.22
B) 22.50
C) 16.24
D) -2.85
E) NOTA
7. Which of the following is the most appropriate conclusion at the 5% level of significance?
A) The mean difference in AP Literature score between female senior students and male senior students who are
dating at the school is significantly different from 0.
B) There is a statistically significant difference between the mean AP Literature score of female and the mean AP
Literature score male senior students at the school.
C) The distribution of AP Literature exam scores is different for female and male seniors at the school.
D) There is a significant linear relationship between the AP Literature exam scores of female seniors and male
seniors who are dating at the school.
E) NOTA
8. Five friends (Anna, Eriel, Matt, Meghana, and Yolanda) who all passed the AP Literature exam with a score of 5 are
among the students in Mr. Callow’s class represented by the data set. Matt is the only boy in the set of five friends. A
student is randomly selected from entire set of students in Mr. Callow’s class data set. What is the probability it is Matt
given that it is a boy who passed the exam? Note: a passing score on the exam is 3 or higher.
1
A) 5
1
B) 6
1
C) 10
1
D) 20
E) NOTA
9. Suppose the 20 senior students in Mr. Callow’s AP Literature class actually constituted a SRS from the population of
seniors at the school. If each grade level (freshmen, sophomores, juniors, and seniors) are all of equal size and the same 1
to 1 ratio of female to male students exists at each grade level, then what is the approximate probability that he would
have exactly 10 girls and 10 boys in his class of 20 seniors if only seniors are permitted to take AP Literature?
A) 0.125
B) 0.176
C) 0.248
D) 0.188
E) NOTA
10. All statistical inference procedures are classified as either “parametric” or “non-parametric.” Non-parametric
inference procedures make no assumptions about the shape or form of the probability distribution from which the data
were drawn or about any known or unknown parameters describing the distribution. By this definition, which of the
following inference procedures are classified non-parametric?
A) T-Tests for Means
B) Linear Regression T-Tests
C) Chi-Square Tests
D) Z-Tests for Proportions
2
E) NOTA
March Statewide Invitational
Statistics Individual
Use the following information to answer the questions 11 – 21.
When two competing teams are equally matched, the probability that either team wins any game between them is the
same, namely 0.5. The NBA Championship in basketball, NHL Stanley Cup Championship in hockey, and Major League
Baseball (MLB) World Series are each awarded to the team that wins four games in a best-of-seven series. If the teams
were equally matched and we assume all games are independent, the probability that the final series ends with one of the
teams winning four straight games (this is known as a “sweep”) is computed as 2(0.5)4 = 0.1250. Similar probability
calculations can determine the likelihood of the 7-game series lasting 5, 6, or 7 games. The following table shows the
number of games it took to decide each of the last 66 NBA Champions, the last 76 NHL Stanley Cup Champions, and the
last 92 MLB World Series Champions. NOTE: It is not necessary to complete the table in order to answer all questions.
Series
Length
4 Games
5 Games
6 Games
7 Games
Total
Expected
Proportion
0.1250
1.0000
NBA Championship
Observed
Expected
Count
Count
8
8.25
16
24
18
66
66
NHL Stanley Cup
Observed Expected
Count
Count
20
9.5
18
22
16
76
76
MLB World Series
Observed Expected
Count
Count
18
11.5
19
20
35
92
92
Total
Observed
46
53
66
69
234
11. What is the probability that a randomly selected championship series is 4-game sweep by one team or the other?
A)
16729
28842
B)
1
8
C)
1
4
D)
23
117
E) NOTA
12. What is the probability that a randomly selected MLB World Series Championship lasted 4 or 6 games?
19
A) 46
61
B) 117
57
C) 92
527
D) 759
E) NOTA
13. A championship finals series is randomly selected from the table. What is the probability that it is either an NBA
Championship series or that it lasted 5 or 6 games?
119
A) 234
145
B) 234
11
C) 39
20
D) 33
E) NOTA
14. Canadians love their hockey above and beyond all other sports - so much that they wish the season could last forever!
Therefore, there is nothing better than a 7-game Stanley Cup Championship series in the mind of a die-hard Canadian
hockey fan. What is the probability that a randomly selected NHL Stanley Cup Championship series lasted 7 games?
43
A) 78
8
B) 117
16
C) 69
4
D) 19
E) NOTA
15. If we assume that the past is a predictor of the future and that the results of the previous 76 NHL Stanley Cup
Championships constitute a SRS from an infinite number of possible outcomes throughout history, Canadian hockey fans
are 99% confident that between ____ and ____ of the time they will have their desired 7-game Stanley Cup finals series.
A) 16% and 26%
B) 12% and 30%
C) 9% and 33%
D) 22% and 37%
E) NOTA
16. What is the theoretical long-run average number of games any best-of-seven championship series is expected to last?
A) 5.5000 games
B) 5.8125 games
C) 5.7500 games
D) 5.6752 games
E) NOTA
17. What is the theoretical long-run average amount of deviation from the expected value for the length of any best-ofseven championship series?
A) 1.0136 games
B) 1.0104 games
C) 1.0988 games
3
D) 1.0965 games
E) NOTA
March Statewide Invitational
Statistics Individual
18. How many of the three sports leagues (NBA, NHL, and MLB) show statistically significant evidence at the 2.5%
level of significance that the two teams competing in the finals are not always equally matched when comparing the
observed frequency distribution to the expected frequency distribution of the finals series length for each sports league
separately? You may assume all necessary assumptions and conditions for the appropriate inference procedure are met.
A) None
B) Exactly one
C) Exactly two
D) All three
E) NOTA
19. Is there statistically significant evidence that the actual observed finals series length is not distributed homogeneously
among the three sports leagues? If so, which of the following is the smallest level of significance for which we can reject
the null hypothesis and support the alternative hypothesis for the appropriate statistical test? Again, you may assume all
necessary assumptions and conditions for the required inference procedure are met.
A) 10%
B) 5%
C) 2%
D) 1%
E) NOTA
20. Ironically, depending on the significance level chosen, the results from the previous two questions have the potential
to contradict each other in the sense that you may come to one conclusion when comparing the observed to the expected
count distributions for each sports league separately (as in #18) versus assessing the homogeneity of the finals series
length distribution among all three sports leagues at once (as in #19). This general phenomenon of arriving at seemingly
contradictory conclusions depending on whether categories are separated or combined is known as…
A) Gauss’s Contradiction
B) Simpson’s Paradox
C) Bernoulli’s Paradox
D) The Hawthorne Effect
E) NOTA
21. No fan of all three sports leagues wants to see a 4-game sweep since longer series are much more exciting! However,
it appears the historical record in the table provides evidence that 4-game sweeps in the three sports leagues combined are
more common than expected. Which of the following is the smallest level of significance for which we can reject the null
hypothesis and support the alternative hypothesis for the appropriate statistical test? You may assume all necessary
inference assumptions and conditions for the required statistical test are met.
A) 10%
B) 5%
C) 2%
D) 1%
E) NOTA
22. As it turns out, the term “correlation” does not only refer to the strength and direction of a linear relationship between
two quantitative variables as measured by the Pearson Product Moment Correlation Coefficient you learned about so far
(for example: height vs. weight). As a matter of fact, a variety of different correlation coefficients exist for various
combinations of scale types or levels of measurement. The table below lists several other correlation types and the types
of scales or levels of measurement used by the two variables. Match each type of correlation in Column X with the
corresponding scale types or levels of measurement in Column Y to form a set of six ordered pairs and compute the
Pearson correlation coefficient between Column X and Column Y.
Column X
Column Y
1. Spearman rank-order:
1. Age group (minors under 18 or adults 18 and over)
One or both variables are ordinal
vs. frequency of Facebook use (number of posts/day)
2. Phi: Both variables are naturally dichotomous
2. Correct / incorrect on a single multiple choice test
(two natural categories)
item vs. the total score (number correct) on the test
3. Tetrachoric:
3. Age group of driver (under 18 or 18 and over) vs.
Both variables are artificially dichotomous
age group of passenger (under 18 or 18 and over) in
(a quantitative scale condensed into 2 categories)
an accident
4. Point-biserial: One variable is naturally
4. Race (White, Black, Latino, Asian, Other) vs. letter
dichotomous, one variable interval or ratio
grade in AP Statistics course (A, B, C, D, or F)
5. Biserial: One variable is artificially
5. Gender (male or female) vs. acceptance into college
dichotomous, one variable is interval or ratio
(yes or no)
6. Gamma: one variable is nominal, one is ordinal
6. AP Statistics Exam score (1, 2, 3, 4, or 5) vs. letter
grade in AP Statistics course (A, B, C, D, or F)
A) -0.429
B) -0.657
C) 0.184
4
D) 0.432
E) NOTA
March Statewide Invitational
Statistics Individual
Use the following information to answer the questions 23 – 26
An application of the point-biserial and phi correlations defined in the previous question is in the area of Psychometrics
(or Psychological Test and Measurement Theory). The point-biserial correlation is used to evaluate the quality of a
multiple choice test item with respect to how well it differentiates higher total scores from lower ones. An item with a
strong positive point biserial correlation indicates a good differentiating item in the sense that higher total scores are
associated with having the item correct and lower total scores are associated with having the item incorrect. In particular,
the mean score of those who answered the item correctly is higher than the mean score of those who answered incorrectly.
The reverse is true for an item with a strong negative point-biserial correlation, thus indicating it is a poor differentiating
question. An item with a significant negative point-biserial correlation with total score is a possible indication that it may
be defective in some way in the sense that there is either a flaw in the item, or perhaps a coding error in the answer key.
Finally, a point-biserial correlation equal or close to 0 is a possible indication that either the item is too easy or too
difficult since either all (or almost all) subjects answered it correctly or all (or almost all) answered it incorrectly.
The phi correlation between individual pairs of distinct test items indicates how well the two items assess the same
concept. A positive 1 indicates perfect agreement and negative 1 indicates perfect disagreement. Pairs of items with
significant positive phi correlations close to 1 should be examined for possible redundancy while a significant negative
phi correlation close to -1 is an indication that the items are likely assessing different concepts.
If incorrect answers are coded as 0 and correct answers are coded as 1, then the computation of both the phi and pointbiserial correlations are identical to the Pearson correlation, with a correlation of 0 assigned to items that either everyone
answers correctly or everyone answers incorrectly when compared with either another item or the total score.
The following tables show the results for 8 AP Statistics students who took a 4 question multiple-choice quiz on
correlation followed by a partially filled in correlation matrix which contains the phi correlations between each pair of
quiz items and the point-biserial correlations between each quiz item and the total score of each student, in terms of the
total number correct on the quiz. Fill in the missing values in the correlation matrix (rounding all results to four decimal
places) and answer the questions that follow.
Student
Item 1
Item 2
Item 3
Item 4
Total Score
A
1
0
1
1
3
Correlation Matrix
Item 1
Item 2
Item 3
Item 4
Total Score
B
1
0
1
1
3
C
1
0
1
0
2
Item 1
1
D
1
0
1
1
3
E
0
0
1
1
2
Item 2
-0.7746
1
0.0000
Item 3
0.0000
0.0000
0.2582
0.5000
0.0000
F
0
1
1
0
2
G
0
1
1
1
3
Item 4
-0.4667
0.0000
1
0.7746
H
0
1
1
0
2
Variance
0.250000
0.234375
0.000000
0.234375
0.250000
Total Score
0.5000
0.0000
1
23. Let P = the strongest phi correlation from the matrix, let B = the strongest point-biserial correlation from the matrix,
and let T = the trace of the matrix. What is the sum of P + B + T?
A) 5.0000
B) 6.0328
C) 4.0000
D) 6.5492
E) NOTA
24. List the quiz items in order from the best differentiating question to the worst.
A) 4, 1, 2, 3
B) 2, 3, 1, 4
C) 3, 2, 4, 1
5
D) 4, 1, 3, 2
E) NOTA
March Statewide Invitational
Statistics Individual
25. Consider the absolute value of either a phi or point-biserial correlation from the matrix to be statistically significant at
the 5% level if the ratio of explained to unexplained variance between the pair of variables exceeds 1. Using this
criterion, what is the total number of distinct, non-redundant correlations in the matrix that are statistically significant?
A) 3
B) 1
C) 0
D) 2
E) NOTA
26. Cronbach’s Alpha is a measure of the internal consistency or reliability of a psychometric test in terms of how closely
related a set of test items are as a group. Its theoretical value varies from 0 to 1, yet when it is estimated from sample
data, it can actually take on any value less than or equal to 1, including negative values. Therefore, alpha is properly
interpreted as a lower bound for the true reliability of the test when computed from sample data. The formula for
computing alpha from a sample of scores from a K-item test is as follows: 𝛼 =
𝐾
(1 −
𝐾−1
∑ 𝑉𝑎𝑟(𝐼)
𝑉𝑎𝑟(𝑇)
) where ∑ 𝑉𝑎𝑟(𝐼) is
the sum of the K item variances and 𝑉𝑎𝑟(𝑇) is the variance of the total test scores. Once computed, a common rule of
thumb for assessing the internal consistency or reliability of the test is as follows:
Cronbach’s Alpha
Internal Consistency
Good to Excellent
0.8 ≤ 𝛼 ≤ 1
Acceptable
0.7 ≤ 𝛼 < 0.8
Poor to Questionable
0.5 ≤ 𝛼 < 0.7
Unacceptable
𝛼 < 0.5
Compute Cronbach’s Alpha using the variances in the table on the previous page, and then use the above criterion to
assess the internal consistency of the 4-question quiz based on these 8 students’ scores. How would you rate the quiz?
A) Good to Excellent
B) Acceptable
C) Poor to Questionable
D) Unacceptable
E) NOTA
27. According to the U.S. Bureau of Labor Statistics website, the median wage for actuaries as of May 2015 was $97,070
with the lowest 10% earning less than $58,290 and the highest 10% earning more than $180,500. Based only on this
information, the distribution of actuary wages is likely (although not necessarily) ________.
A) Normal
B) skewed right
C) skewed left
D) uniform
E) NOTA
28. Dr. Robert J. Marzano is the cofounder and CEO of Marzano Research in Colorado. He is a leading researcher in
education whose firm has performed many hypothesis tests over his very long career. As a matter of fact, it is likely to
reach well into the thousands by the time he retires. Assuming independence and a 5% level of significance for each
hypothesis test his firm has performed, what is the approximate probability they committed at least one Type-I error
throughout their existence?
A) 0.05
B) 0.95
C) 0.50
D) 0.00
E) NOTA
29. Suppose a coin is biased towards heads with probability 0.6. Which of the following is the minimum number of flips
required to detect this positive difference with a power of at least 0.9 and a 2.5% Type-I Error rate? Round any critical
values used in your computations to the nearest hundredth and any other intermediate steps to at least six decimal places.
A) 200
B) 259
C) 369
D) 385
E) NOTA
30. If you answered every question correctly thus far on this test, you may notice a potentially lower number of questions
for which the correct answer is “E) NOTA” than expected under the assumption all answer choices are equally likely to be
the correct answer for any given question on a multiple choice test this length. However, the exact number is not unusual
since it is well within two standard deviations of the expected value. Using this criterion, what are the lower and upper
bound for the expected number of questions on a test of this type for which the correct answer choice is “E) NOTA” is not
considered “unusual?”
A) 3.81 to 8.19
B) 1.62 to 10.38
C) 5.22 to 6.78
6
D) 1.71 to 10.29
E) NOTA