Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
March Statewide Invitational Statistics Individual Important Instructions for this Test: Round all final answers to the appropriate decimal place as indicated by the answer choices. Round any intermediate steps as indicated or as necessary to make the final answer as accurate as possible. For example, if answer choices are rounded to hundredths place, then round your final answer to hundredths place and choose the closest option. Good luck, and as always: “NOTA” stands for “None of These Answers is correct.” Use the following information to answer questions 1 – 9. Mr. Callow wishes to investigate the relationship between gender and AP Literature exam score among the seniors at his small high school of only 160 students, exactly half of which are female and half are male. He uses the 20 senior students from his current AP Literature class, which has exactly 10 boys and 10 girls, and coincidentally, the ten boys and ten girls also happen to be couples that have been dating since freshman year. The results are summarized in the table below. Couple Girl Boy Difference (Girl – Boy) A 5 1 4 B 5 1 4 C 5 1 4 D 5 1 4 E 5 1 4 F 4 1 3 G 4 2 2 H 4 2 2 I 4 2 2 J 1 5 -4 Mean 1.7 Standard Deviation 1.229273 2.460804 Maximum 5 5 4 Q3 5 4 Median 1 Q1 4 1 2 Minimum 1 1 -4 1. Fill in the six missing values in the summary statistics section of the table rounding the missing sample standard deviation to six decimal places. What is the sum of these six missing values? A) 17.951666 B) 17.887434 C) 15.951666 D) 15.887434 E) NOTA 2. As you may have noticed by now, most statistics competition questions involving inference procedures instruct you to assume all necessary assumptions and conditions for performing the procedure are met. However, in real practice it is essential to diligently verify them; otherwise, the procedure cannot be performed. With that said, how many of the following require verification before performing an independent two-sample t-test to compare two group means? i. ii. iii. iv. Either random sampling from the populations of interest or random assignment to treatments in a welldesigned experiment produces the data in the two samples. Either both samples show evidence of coming from normally distributed populations (in particular, no outliers or major skewness in the data) or both sample sizes are greater than 30 for the Central Limit Theorem (CLT) to apply. It is reasonable to assume that both samples and/or treatment groups are independent of each other and that all data observations within each sample and/or treatment group are independent of each other. Neither sample exceeds 10% of their respective population when sampling without replacement. A) Exactly one B) Exactly two C) Exactly three D) All four E) NOTA 3. What sampling method did Mr. Callow use to select the subjects in his study? A) Simple random sample B) Stratified random sample C) Cluster sample D) Convenience sample 1 E) NOTA March Statewide Invitational Statistics Individual 4. Determine the outliers in each column among the scores for the girls, boys, and differences in scores using the 1.5 IQR rule. Convert each of these outliers to a Z-score using the sample mean and sample standard deviation in the respective column for each outlier. What is the sample mean of these Z-scores? Round any intermediate steps to six decimal places. A) 0.0000 B) -0.8694 C) 0.21429 D) -0.8218 E) NOTA 5. What is the most appropriate inference procedure for Mr. Callow’s data in order to investigate the relationship between gender and AP Literature test score among the seniors at his school? A) Matched-pairs t-test B) Independent two-sample t-test C) Chi-square test of homogeneity D) Linear regression t-test E) NOTA 6. What is the sum of the test statistic, two-tail p-value, and degrees of freedom for the appropriate inference procedure? A) 12.22 B) 22.50 C) 16.24 D) -2.85 E) NOTA 7. Which of the following is the most appropriate conclusion at the 5% level of significance? A) The mean difference in AP Literature score between female senior students and male senior students who are dating at the school is significantly different from 0. B) There is a statistically significant difference between the mean AP Literature score of female and the mean AP Literature score male senior students at the school. C) The distribution of AP Literature exam scores is different for female and male seniors at the school. D) There is a significant linear relationship between the AP Literature exam scores of female seniors and male seniors who are dating at the school. E) NOTA 8. Five friends (Anna, Eriel, Matt, Meghana, and Yolanda) who all passed the AP Literature exam with a score of 5 are among the students in Mr. Callow’s class represented by the data set. Matt is the only boy in the set of five friends. A student is randomly selected from entire set of students in Mr. Callow’s class data set. What is the probability it is Matt given that it is a boy who passed the exam? Note: a passing score on the exam is 3 or higher. 1 A) 5 1 B) 6 1 C) 10 1 D) 20 E) NOTA 9. Suppose the 20 senior students in Mr. Callow’s AP Literature class actually constituted a SRS from the population of seniors at the school. If each grade level (freshmen, sophomores, juniors, and seniors) are all of equal size and the same 1 to 1 ratio of female to male students exists at each grade level, then what is the approximate probability that he would have exactly 10 girls and 10 boys in his class of 20 seniors if only seniors are permitted to take AP Literature? A) 0.125 B) 0.176 C) 0.248 D) 0.188 E) NOTA 10. All statistical inference procedures are classified as either “parametric” or “non-parametric.” Non-parametric inference procedures make no assumptions about the shape or form of the probability distribution from which the data were drawn or about any known or unknown parameters describing the distribution. By this definition, which of the following inference procedures are classified non-parametric? A) T-Tests for Means B) Linear Regression T-Tests C) Chi-Square Tests D) Z-Tests for Proportions 2 E) NOTA March Statewide Invitational Statistics Individual Use the following information to answer the questions 11 – 21. When two competing teams are equally matched, the probability that either team wins any game between them is the same, namely 0.5. The NBA Championship in basketball, NHL Stanley Cup Championship in hockey, and Major League Baseball (MLB) World Series are each awarded to the team that wins four games in a best-of-seven series. If the teams were equally matched and we assume all games are independent, the probability that the final series ends with one of the teams winning four straight games (this is known as a “sweep”) is computed as 2(0.5)4 = 0.1250. Similar probability calculations can determine the likelihood of the 7-game series lasting 5, 6, or 7 games. The following table shows the number of games it took to decide each of the last 66 NBA Champions, the last 76 NHL Stanley Cup Champions, and the last 92 MLB World Series Champions. NOTE: It is not necessary to complete the table in order to answer all questions. Series Length 4 Games 5 Games 6 Games 7 Games Total Expected Proportion 0.1250 1.0000 NBA Championship Observed Expected Count Count 8 8.25 16 24 18 66 66 NHL Stanley Cup Observed Expected Count Count 20 9.5 18 22 16 76 76 MLB World Series Observed Expected Count Count 18 11.5 19 20 35 92 92 Total Observed 46 53 66 69 234 11. What is the probability that a randomly selected championship series is 4-game sweep by one team or the other? A) 16729 28842 B) 1 8 C) 1 4 D) 23 117 E) NOTA 12. What is the probability that a randomly selected MLB World Series Championship lasted 4 or 6 games? 19 A) 46 61 B) 117 57 C) 92 527 D) 759 E) NOTA 13. A championship finals series is randomly selected from the table. What is the probability that it is either an NBA Championship series or that it lasted 5 or 6 games? 119 A) 234 145 B) 234 11 C) 39 20 D) 33 E) NOTA 14. Canadians love their hockey above and beyond all other sports - so much that they wish the season could last forever! Therefore, there is nothing better than a 7-game Stanley Cup Championship series in the mind of a die-hard Canadian hockey fan. What is the probability that a randomly selected NHL Stanley Cup Championship series lasted 7 games? 43 A) 78 8 B) 117 16 C) 69 4 D) 19 E) NOTA 15. If we assume that the past is a predictor of the future and that the results of the previous 76 NHL Stanley Cup Championships constitute a SRS from an infinite number of possible outcomes throughout history, Canadian hockey fans are 99% confident that between ____ and ____ of the time they will have their desired 7-game Stanley Cup finals series. A) 16% and 26% B) 12% and 30% C) 9% and 33% D) 22% and 37% E) NOTA 16. What is the theoretical long-run average number of games any best-of-seven championship series is expected to last? A) 5.5000 games B) 5.8125 games C) 5.7500 games D) 5.6752 games E) NOTA 17. What is the theoretical long-run average amount of deviation from the expected value for the length of any best-ofseven championship series? A) 1.0136 games B) 1.0104 games C) 1.0988 games 3 D) 1.0965 games E) NOTA March Statewide Invitational Statistics Individual 18. How many of the three sports leagues (NBA, NHL, and MLB) show statistically significant evidence at the 2.5% level of significance that the two teams competing in the finals are not always equally matched when comparing the observed frequency distribution to the expected frequency distribution of the finals series length for each sports league separately? You may assume all necessary assumptions and conditions for the appropriate inference procedure are met. A) None B) Exactly one C) Exactly two D) All three E) NOTA 19. Is there statistically significant evidence that the actual observed finals series length is not distributed homogeneously among the three sports leagues? If so, which of the following is the smallest level of significance for which we can reject the null hypothesis and support the alternative hypothesis for the appropriate statistical test? Again, you may assume all necessary assumptions and conditions for the required inference procedure are met. A) 10% B) 5% C) 2% D) 1% E) NOTA 20. Ironically, depending on the significance level chosen, the results from the previous two questions have the potential to contradict each other in the sense that you may come to one conclusion when comparing the observed to the expected count distributions for each sports league separately (as in #18) versus assessing the homogeneity of the finals series length distribution among all three sports leagues at once (as in #19). This general phenomenon of arriving at seemingly contradictory conclusions depending on whether categories are separated or combined is known as… A) Gauss’s Contradiction B) Simpson’s Paradox C) Bernoulli’s Paradox D) The Hawthorne Effect E) NOTA 21. No fan of all three sports leagues wants to see a 4-game sweep since longer series are much more exciting! However, it appears the historical record in the table provides evidence that 4-game sweeps in the three sports leagues combined are more common than expected. Which of the following is the smallest level of significance for which we can reject the null hypothesis and support the alternative hypothesis for the appropriate statistical test? You may assume all necessary inference assumptions and conditions for the required statistical test are met. A) 10% B) 5% C) 2% D) 1% E) NOTA 22. As it turns out, the term “correlation” does not only refer to the strength and direction of a linear relationship between two quantitative variables as measured by the Pearson Product Moment Correlation Coefficient you learned about so far (for example: height vs. weight). As a matter of fact, a variety of different correlation coefficients exist for various combinations of scale types or levels of measurement. The table below lists several other correlation types and the types of scales or levels of measurement used by the two variables. Match each type of correlation in Column X with the corresponding scale types or levels of measurement in Column Y to form a set of six ordered pairs and compute the Pearson correlation coefficient between Column X and Column Y. Column X Column Y 1. Spearman rank-order: 1. Age group (minors under 18 or adults 18 and over) One or both variables are ordinal vs. frequency of Facebook use (number of posts/day) 2. Phi: Both variables are naturally dichotomous 2. Correct / incorrect on a single multiple choice test (two natural categories) item vs. the total score (number correct) on the test 3. Tetrachoric: 3. Age group of driver (under 18 or 18 and over) vs. Both variables are artificially dichotomous age group of passenger (under 18 or 18 and over) in (a quantitative scale condensed into 2 categories) an accident 4. Point-biserial: One variable is naturally 4. Race (White, Black, Latino, Asian, Other) vs. letter dichotomous, one variable interval or ratio grade in AP Statistics course (A, B, C, D, or F) 5. Biserial: One variable is artificially 5. Gender (male or female) vs. acceptance into college dichotomous, one variable is interval or ratio (yes or no) 6. Gamma: one variable is nominal, one is ordinal 6. AP Statistics Exam score (1, 2, 3, 4, or 5) vs. letter grade in AP Statistics course (A, B, C, D, or F) A) -0.429 B) -0.657 C) 0.184 4 D) 0.432 E) NOTA March Statewide Invitational Statistics Individual Use the following information to answer the questions 23 – 26 An application of the point-biserial and phi correlations defined in the previous question is in the area of Psychometrics (or Psychological Test and Measurement Theory). The point-biserial correlation is used to evaluate the quality of a multiple choice test item with respect to how well it differentiates higher total scores from lower ones. An item with a strong positive point biserial correlation indicates a good differentiating item in the sense that higher total scores are associated with having the item correct and lower total scores are associated with having the item incorrect. In particular, the mean score of those who answered the item correctly is higher than the mean score of those who answered incorrectly. The reverse is true for an item with a strong negative point-biserial correlation, thus indicating it is a poor differentiating question. An item with a significant negative point-biserial correlation with total score is a possible indication that it may be defective in some way in the sense that there is either a flaw in the item, or perhaps a coding error in the answer key. Finally, a point-biserial correlation equal or close to 0 is a possible indication that either the item is too easy or too difficult since either all (or almost all) subjects answered it correctly or all (or almost all) answered it incorrectly. The phi correlation between individual pairs of distinct test items indicates how well the two items assess the same concept. A positive 1 indicates perfect agreement and negative 1 indicates perfect disagreement. Pairs of items with significant positive phi correlations close to 1 should be examined for possible redundancy while a significant negative phi correlation close to -1 is an indication that the items are likely assessing different concepts. If incorrect answers are coded as 0 and correct answers are coded as 1, then the computation of both the phi and pointbiserial correlations are identical to the Pearson correlation, with a correlation of 0 assigned to items that either everyone answers correctly or everyone answers incorrectly when compared with either another item or the total score. The following tables show the results for 8 AP Statistics students who took a 4 question multiple-choice quiz on correlation followed by a partially filled in correlation matrix which contains the phi correlations between each pair of quiz items and the point-biserial correlations between each quiz item and the total score of each student, in terms of the total number correct on the quiz. Fill in the missing values in the correlation matrix (rounding all results to four decimal places) and answer the questions that follow. Student Item 1 Item 2 Item 3 Item 4 Total Score A 1 0 1 1 3 Correlation Matrix Item 1 Item 2 Item 3 Item 4 Total Score B 1 0 1 1 3 C 1 0 1 0 2 Item 1 1 D 1 0 1 1 3 E 0 0 1 1 2 Item 2 -0.7746 1 0.0000 Item 3 0.0000 0.0000 0.2582 0.5000 0.0000 F 0 1 1 0 2 G 0 1 1 1 3 Item 4 -0.4667 0.0000 1 0.7746 H 0 1 1 0 2 Variance 0.250000 0.234375 0.000000 0.234375 0.250000 Total Score 0.5000 0.0000 1 23. Let P = the strongest phi correlation from the matrix, let B = the strongest point-biserial correlation from the matrix, and let T = the trace of the matrix. What is the sum of P + B + T? A) 5.0000 B) 6.0328 C) 4.0000 D) 6.5492 E) NOTA 24. List the quiz items in order from the best differentiating question to the worst. A) 4, 1, 2, 3 B) 2, 3, 1, 4 C) 3, 2, 4, 1 5 D) 4, 1, 3, 2 E) NOTA March Statewide Invitational Statistics Individual 25. Consider the absolute value of either a phi or point-biserial correlation from the matrix to be statistically significant at the 5% level if the ratio of explained to unexplained variance between the pair of variables exceeds 1. Using this criterion, what is the total number of distinct, non-redundant correlations in the matrix that are statistically significant? A) 3 B) 1 C) 0 D) 2 E) NOTA 26. Cronbach’s Alpha is a measure of the internal consistency or reliability of a psychometric test in terms of how closely related a set of test items are as a group. Its theoretical value varies from 0 to 1, yet when it is estimated from sample data, it can actually take on any value less than or equal to 1, including negative values. Therefore, alpha is properly interpreted as a lower bound for the true reliability of the test when computed from sample data. The formula for computing alpha from a sample of scores from a K-item test is as follows: 𝛼 = 𝐾 (1 − 𝐾−1 ∑ 𝑉𝑎𝑟(𝐼) 𝑉𝑎𝑟(𝑇) ) where ∑ 𝑉𝑎𝑟(𝐼) is the sum of the K item variances and 𝑉𝑎𝑟(𝑇) is the variance of the total test scores. Once computed, a common rule of thumb for assessing the internal consistency or reliability of the test is as follows: Cronbach’s Alpha Internal Consistency Good to Excellent 0.8 ≤ 𝛼 ≤ 1 Acceptable 0.7 ≤ 𝛼 < 0.8 Poor to Questionable 0.5 ≤ 𝛼 < 0.7 Unacceptable 𝛼 < 0.5 Compute Cronbach’s Alpha using the variances in the table on the previous page, and then use the above criterion to assess the internal consistency of the 4-question quiz based on these 8 students’ scores. How would you rate the quiz? A) Good to Excellent B) Acceptable C) Poor to Questionable D) Unacceptable E) NOTA 27. According to the U.S. Bureau of Labor Statistics website, the median wage for actuaries as of May 2015 was $97,070 with the lowest 10% earning less than $58,290 and the highest 10% earning more than $180,500. Based only on this information, the distribution of actuary wages is likely (although not necessarily) ________. A) Normal B) skewed right C) skewed left D) uniform E) NOTA 28. Dr. Robert J. Marzano is the cofounder and CEO of Marzano Research in Colorado. He is a leading researcher in education whose firm has performed many hypothesis tests over his very long career. As a matter of fact, it is likely to reach well into the thousands by the time he retires. Assuming independence and a 5% level of significance for each hypothesis test his firm has performed, what is the approximate probability they committed at least one Type-I error throughout their existence? A) 0.05 B) 0.95 C) 0.50 D) 0.00 E) NOTA 29. Suppose a coin is biased towards heads with probability 0.6. Which of the following is the minimum number of flips required to detect this positive difference with a power of at least 0.9 and a 2.5% Type-I Error rate? Round any critical values used in your computations to the nearest hundredth and any other intermediate steps to at least six decimal places. A) 200 B) 259 C) 369 D) 385 E) NOTA 30. If you answered every question correctly thus far on this test, you may notice a potentially lower number of questions for which the correct answer is “E) NOTA” than expected under the assumption all answer choices are equally likely to be the correct answer for any given question on a multiple choice test this length. However, the exact number is not unusual since it is well within two standard deviations of the expected value. Using this criterion, what are the lower and upper bound for the expected number of questions on a test of this type for which the correct answer choice is “E) NOTA” is not considered “unusual?” A) 3.81 to 8.19 B) 1.62 to 10.38 C) 5.22 to 6.78 6 D) 1.71 to 10.29 E) NOTA