Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SOLUTIONS Stat 250.3 - Second Midterm Exam Monday, November 24, 2003 Part I: (100 points) Written Problems: Show ALL work, calculations and formulas used. Partial credit will be awarded for using the correct procedures. Problem 1: Suppose that the age at death for all American citizens is normally distributed with mean 70.5 years old and standard deviation 5.9 years old. [10 points] A. Describe the sampling distribution of x from a sample of size 100. (Distributions and parameters! ) It is Normal (exaclty normal, since X is Normal), with mean E( x )=µ=70.5, and standard deviation sd( x )= 5.9 n 100 .59. B. Turn your description into a picture, including as much information as possible. (Draw the pdf.) It is a normal curve, centered at 70.5, between 70.5±0.59 falls 68% of the area under the curve, and between 70.5±2(0.59) falls 68% of the area under the curve. (You should draw this curve!) C. If I select a sample of 100 death ages of American citizens, what is the probability that I will get a sample mean greater than 72 years? P( x 72) P( Z 72 70.5 ) P( Z 2.54) 1 P( Z 2.54) 1 0.9945 0.0055 0.59 Problem 2: A scientist is studying human evolution. He is particularly interested in changes in cranial capacity (measured in cubic centimeters). As part of his research the scientist is studying a sample of 25 adult Neanderthal skulls. For each skull he records the cranial capacity, the summary statistics are below: [30 points] Descriptive Statistics: CC Skulls Variable CC Skulls N 25 Mean 1386.5 Median 1388.8 StDev 106.9 Construct a 90% confidence interval for the mean cranial capacity. If the conditions are not met, STILL construct the confidence interval, AND state what additional information you would need to evaluate the conditions. Show ALL the work, and include an interpretation! Conditions: Since n=25<30 we need to have some information regarding the distribution of the population from where we collected the data. In order that the conditions are satisfied, we want the distribution to be bell shaped. CI calculations: x =1385.5, se( x )= s n 106.9 25 21.25 , t*=1.71 (using table A.2, with n-1=24 df and 90% confidence level). 90% CI for the mean is: 1385.5±(1.71)21.25 = ( 1349.162, 1421.838) Interpretation: We are 90% confident that the mean cranial capacity for adult Neanderthal skulls is from 1349.162 to 1421.838 cubic centimeters. Problem 3: An employee of Consumer Reports is evaluating a candy bar contest. The wrapper of each candy bar contains either a prize or no prize. The candy bar company claims that the odds of winning a prize is 1 in 10 (that is 10% of candy bars should have prizes). The employee suspects that the company is lying, and that the chance of winning a prize is less than this. The employee takes a random sample of 400 bars and finds that 26 of them contain a prize. Does the employee have strong evidence to conclude that the candy company lied? Construct an appropriate hypothesis test. Note: Show ALL steps including an interpretation (Step 5)! If the conditions are not met just point it out but STILL perform the hypothesis test. [35 points] Step 1: H0: p=.1 and Ha: p < .1, where p= the proportion of candy bars with a prize. Step 2: np0 = 400(.1) = 40, and n(1-p0)=360, thus the conditions are satisfied. Step 3: z pˆ p 0 p 0 (1 p 0 ) n 26 / 400 0.1 0.1(1 0.1) 400 2.33 , p-value=P(Z<-2.33) = .0099 (From table A.1). Step 4: The test is significant, thus we reject the null hypothesis. Step 5: We have enough evidence to claim that the proportion of candy bars with a prize is less than 0.1, and thus, we can claim that the company is lying. Problem 4: A researcher is interested in the relationship between smoking and high blood pressure. He takes a random sample of 1000 adults and records their smoking status (Y/N) and blood pressure (High/Low). He then performs a hypothesis test in MINITAB. Use the description of the problem and the output to answer the questions below: [35 points] Test and CI for Two Proportions X = High BP Sample NonSmoke Smoking X 154 50 N 821 179 Sample p 0.187576 0.279330 Estimate for p(ns) - p(s): -0.0917535 95% CI for p(ns) - p(s): (-0.162698, -0.0208087) Test for p(ns) - p(s) = 0 (vs not = 0): Z = -2.53 A. Fill in the following: Parameter of interest (use symbols): pns p s Sample estimate (use symbols AND actual value): pˆ ns pˆ s 0.09175 Alternative Hypothesis: Ha : pns ps 0 Z-Statistic: Z=-2.53 B. Calculate the p-value and write an appropriate conclusion. Be SPECIFIC to the problem at hand. Simply reject or fail to reject is NOT SUFFICIENT! p-value=2P(Z>|-2.53|)=2(0.0057)=0.0114 Since the p-value is less that 0.05, we can reject the null hypothesis and claim that there is significant difference between the two proportions. Therefore, there is a significant relationship between smoking and blood pressure. C. Explain what the p-value means. Again, be specific to this problem. There is .0114 probability that in a random sample the difference between the proportion of smokers and non smokers with high blood pressure to be greater than 0.0917 or less than -0.0917, if there is no relationship between smoking and blood pressure (Ho). Midterm Exam Part II: (100 points) Multiple Choice Problems SOLUTIONS: BCDAC CBBBB DDACA DDCDB Questions 1-3. A random sample of 2,470 12th grade students in the United States is asked how often they wear seatbelts when driving, and also is asked about their typical grades in school. There are three grade categories (As and Bs, C, Ds and Fs) and five seatbelt use categories (Never, Rarely, Sometimes, Most times, and Always). The following Minitab output is for a chi-square analysis of the relationship between typical grades and seatbelt use. Rows = typical grades in school, Columns = how often student wears seatbelt when driving A_and_B C D_and_F All Never 52 32 18 102 Rarely 128 93 22 243 Sometimes Mosttmes Always 166 298 1056 104 128 300 8 24 41 278 450 1397 All 1700 657 113 2470 Chi-Square = 126.203, DF = 8, P-Value = 0.000 1. Based on the results given above, an appropriate conclusion for a significance test is A. The observed relationship is not statistically significant because the p-value is less than .05. B. The observed relationship is statistically significant because the p-value is less than .05. C. The observed relationship is statistically significant because the chi-square value is greater than 0. D. The observed relationship is not statistically significant because the chi-square value is greater than 0.05. 2. In this problem, the connection between the chi-square value and the p-value is A. The p-value is the area to the right of 126.203 in a chi-square distribution with df =15. B. The p-value is the area to the left of 126.203 in a chi-square distribution with df = 15. C. The p-value is the area to the right of 126.203 in a chi-square distribution with df = 8. D. The p-value is the area to the left of 126.203 in a chi-square distribution with df = 8. 3. The “expected” count for the cell “grades = A_and_B, seatbelt = Never” is A. 52 B. 2470/15 = 164.67 C. 1700/5 = 340 D. (1700)(102)/2470 = 70.2 _____________________________________________________________________________________________ Questions 4-8: Identify the proper statistical procedure for each of the following scenarios: 4. Engineers at a ceramics factory create a new process that is designed to reduce the number of flaws in the ceramic bowls the factory produces. The engineers take a sample of 100 bowls using the old method, and 100 with the new method. Each bowl is classified as Flawed/Non-Flawed. Is the new method superior to the old one? A. Test of 2-proportions. B. Test of 2-means. C. CI for 2-means. D. Chi-Square Test. 5. There is a debate among scientists about whether the ozone hole over Antarctica is getting smaller. Scientists know that in 1995 the mean ozone concentration over Antarctica was 200ppm (parts per million). In January of 2003 they took a sample of 100 ozone measurements over Antarctica. Has the average ozone level increased from 1995? A. Test of 1-proportion. B. CI for 2-proportions. C. Test of 1-mean. D. Test of 2-means. 6. A stat 200 student is interested in estimating the proportion of all PSU students who favor legalization of marijuana. A. CI for 1 mean. B. Test of 1 mean. C. CI for 1-proportion. D. Test for 1-proportion. 7. Researchers record both the smoking status (Smoke, Don’t Smoke), and blood pressure (recorded as Low, Good, Hight) from 200 PSU student volunteers. Is there a relationship between Smoking Status and Blood Pressure? A. Test of 2-means. B. Chi-Square test. C. Test of 2-proportions. D. CI for 2-proportions. 8. It is known that for right-handed people, the dominant (right) hand tends to be stronger. For left-handed people who live in a world designed for right-handed people, the same may not be true. To test this, muscle strength was measured on the right and left hands of a random sample of 15 left-handed men. Is the dominant hand of left handed people stronger than the right hand? A. A two-sample t-test. B. A paired t-test. C. Chi-Square test. D. Test for 1-proportion. _____________________________________________________________________________________________ 9. Which of the following statements is true about a parameter and a statistic for samples taken from the same population? A. The value of the parameter varies from sample to sample. B. The value of the statistic varies from sample to sample. C. Both A and B are true. D. Neither A nor B are true. 10. Suppose a researcher is interested in answering the question, “Is the percentage of all males who use drugs different than the percentage of all females who use drugs?” Which of the following would be appropriate null and alternative hypotheses? A. Ho: p males = p females, Ha: p males ≠ p females B. Ho: p males = p females, Ha: p males ≠ p females C. Ho: p males ≠ p females, Ha: p males = p females D. Ho: µ males = μ females, Ha: µ males ≠ μ females __________________________________________________________________________________________ Use with Questions 11-13. A null hypothesis is that the mean nose lengths of men and women are the same. The alternative hypothesis is that men have a larger nose length than women. 11. Which of the following is the correct way to state the null hypothesis? A. D= 0 B. x1 x 2 0 C. p1 - p2 = 0 D. 1 - 2 = 0 12. A statistical test is done and the p-value is 0.339. Which of the following is the most correct way to appropriately state the conclusion? A. The mean nose lengths of men and women are identical B. Men have a greater mean nose length. C. The probability is 0.339 that men and women have the same mean nose length. D. There is not enough evidence to say that that men and women have different nose lengths. 13. Refer back to the information in question 5. Which of the following statements is correct? A. A 95% confidence interval for the difference in means will include 0. B. A 95% confidence interval for the difference in means will not include 0. C. Not enough information is available to know if the interval for the difference in means will include 0. __________________________________________________________________________________________ 14. Suppose that the p-value for testing Ho: p =0.5 vs. the alternative Ha: p < 0.5 was 0.002. If the alternative hypothesis had been Ha: p 0.5, what would the p-value of the test be? A. 0.002 B. 0.001 C. 0.004 D. 0.5 __________________________________________________________________________________________ Questions 15 and 16. Based on sample data from the 1993 General Social Survey, a 95% confidence interval for the difference between the proportions of men and women in the United States who think marijuana should be legalized is 0.035 to 0.145. In the sample, 30% of the men favored legalization and only 21% of the women favored legalization. 15. Based on this confidence interval we can reject the null hypothesis that the population proportions are equal. A. True B. False C. We can not decided. 16. Fill in the blank. The confidence interval listed above is an estimate of _____. A. 1 - 2 B. p̂1 p̂ 2 C. x 1 x 2 D. p1 p 2 _________________________________________________________________________________________ 17. Suppose that a researcher writes that she found a statistically significant relationship between gender and whether or not a person “smokes”? What does that statement mean? A. She has concluded that a relationship exists between gender and “smoking status” in the population represented by the sample. B. She rejected the null hypothesis that the two proportions are the same when comparing the proportion that said “yes” for each gender. C. The p-value of a significance test was less than .05 (5%) D. All of choices A, B, and C are correct. 18. Suppose a researcher is interested in testing a new drug (Compound X) versus an older drug (Compound A). He initially designs a study that has 1000 subjects, but because of a lack of funds his sample size is reduced to 200. Which of the following is true? A. The new study design will have more power than the original study design. B. The new study design will have a smaller chance of a type II error compared with the original design. C. The new study design is less powerful, it has a smaller chance of detecting a difference between Compound X and Compound A. D. Both B and C are true 19. A random sample of 600 adults is taken from a population of over one million, in order to compute a confidence interval for a proportion. If the researchers wanted to decrease the width of the confidence interval, they could: A. Decrease the size of the population B. Decrease the size of the sample C. Increase the size of the population D. Increase the size of the sample 20. Suppose that the chi-square statistic equals 10.9 for a two-way table with 4 rows and 2 columns. Which range gives the approximate p-value for this situation? A. Less than 0.001 B. Between 0.01 and 0.025 C. Between 0.025 and 0.05 D. Between 0.10 and 0.25 _____________________________________________________________________________________________ The following information might be useful: s.d.( p̂ ) = p(1 p) n s.d.( x ) = , n Row total Column tot al (Obs. Exp.) 2 and df = (r-1)x(c-1) , where Expected Total sample size n Exp. all 2 cells n( AD BC ) 2 Special case for 2x2 talbes: R1 R2 C1C 2 2 Inference Parameter One Mean (1-sample t) µ or µd Difference of two means (2-sample t) µ1-µ2 One proportion P Difference of two proport. p1-p2 Statistic Standard Error x or d x1 x 2 p̂ pˆ 1 pˆ 2 s sd or n n 2 2 s1 s 2 n1 n2 pˆ (1 pˆ ) n pˆ 1(1 pˆ 1) pˆ 2(1 pˆ 2) n1 n2 Multiplier t* Test Statistic t df=n-1 t* x 0 d 0 or t sd s n n t df=min(n1-1, n2-1) z z* z z* ( x1 x 2 ) 0 s1 2 s 2 2 n1 n 2 pˆ p0 p0 (1 p0 ) n pˆ1 pˆ 2 n pˆ n pˆ , pˆ 1 1 2 2 n1 n2 pˆ (1 pˆ ) pˆ (1 pˆ ) n1 n2