Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Taylor's law wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Statistical inference wikipedia , lookup
Time series wikipedia , lookup
Student's t-test wikipedia , lookup
Introduction to Statistics and Data Analysis Chapter 10 – Hypothesis Test Basics Criminal Trials in the United States The jury is always told that the defendant is “innocent until proven guilty”. 1. What must a member of the jury assume about the defendant at the beginning of the trial? This is the null hypothesis. H0: _______________________ (HINT: one word) 2. It is the prosecuting attorney’s job to present evidence to the jury. IF there is enough evidence (“beyond a reasonable doubt”), then the jury will convict the defendant of the crime. If the defendant is convicted, the jury is rejecting the null hypothesis (above) and saying the the defendant is _________. This is the alternative hypothesis. Ha: _______________________ 3. When the jury convicts someone of a crime, their verdict is GUILTY. Is this “Reject H0” OR “Fail to Reject H0”? 4. If the jury fails to convict someone of a crime, their verdict is NOT GUILTY. Is this “Reject H0” OR “Fail to Reject H0”? How does the verdict of “not guilty” differ from “innocent”? 5. Sometimes the jury makes a correct decision and sometime the jury makes a mistake. a. When H0 is true, but we reject it based on the sample evidence, this is an error. We call it a Type I error. Write a sentence describing a Type I error in the U.S. criminal justice system. b. When H0 is false, but we fail to reject it based on the sample evidence, this is also an error. We call it a Type II error. Write a sentence describing a Type II error in the U.S. criminal justice system. Chapter 10 Activities Worksheets Put each of the following in the correct place in the table below… Type I Error Type II Error Correct Decision Correct Decision Decision Based on Evidence (Data) Reject H0 Fail to Reject H0 H0 is true TRUTH (Unknown) H0 is false (and Ha is true) Medical Testing Medical tests have been developed to detect many serious diseases (such as cancer and HIV). A medical test is designed to give correct results as often as possible. That is, to minimize the occurrence of “false positives” and “false negatives”. A doctor starts by assuming that a patient is healthy (no disease), then looks for evidence to contradict that assumption. If the patient has a negative test result, the doctor continues to assume that the patient is healthy. If the patient has a positive test result, the doctor concludes that the patient has a disease. A. State H0 and Ha. B. When will the doctor Reject H0? C. When will the doctor Fail to Reject H0? D. What kind of an error is a “false positive”? EXPLAIN. E. What kind of an error is a “false negative”? EXPLAIN. F. What are the consequences of a false positive? Of a false negative? Chapter 10 Activities Worksheets Solution to Hypothesis Test Basics Criminal Trials in the United States 1. H0: Defendant Is Innocent 2. Ha: Defendant Is Guilty 3. Conviction = Guilty Verdict = “Reject H0” 4. Failure to Convict = NOT Guilty Verdict = “Fail to Reject H0” “Not Guilty” indicates that there was not enough evidence to convict the defendant. This verdict makes no statement about whether or not the defendant committed the crime. “Innocent” indicates that the defendant did not commit the crime. 5. Type I Error = Guilty Verdict When Defendant Is Innocent 6. Type II Error = NOT Guilty Verdict When Defendant is Guilty Decision Based on Evidence (Data) Reject H0 Fail to Reject H0 TRUTH (Unknown) H0 is true H0 is false (and Ha is true) Type I Error Correct Decision Correct Decision Type II Error Medical Testing A. H0: Patient Is Healthy Ha: Patient Has Disease B. Doctor rejects H0 when test result is positive. C. Doctor fails to reject H0 when test result is negative. D. A false positive is a Type I error (reject H0 when H0 is true). E. A false negative is a Type II error (fail to reject H0 when H0 is false). F. With a false positive, a person thinks they have a disease (and may start treatment) when they are healthy. With a false negative, a person doesn’t know they have a disease (and don’t start treatment). Chapter 10 Activities Worksheets Introduction to Statistics and Data Analysis Chapter 10 – Type I and Type II Errors In the Movies… A movie critic claims that, among children’s movies that show the use of tobacco, the mean exposure time is less than 2 minutes. i. Identify the population type and describe the population characteristic (in words). ii. State H0 and Ha iii. Describe a Type I error IN CONTEXT. iv. Describe a Type II error IN CONTEXT. Chapter 10 Activities Worksheets Potential Side Effects… A researcher claims that over 1% of the people who take the drug Lipitor experience flu-like symptoms. i. Identify the population type and describe the population characteristic (in words). ii. State H0 and Ha iii. Describe a Type I error IN CONTEXT. iv. Describe a Type II error IN CONTEXT. Chapter 10 Activities Worksheets Solution To Type I and Type II Errors 1. Tobacco Exposure i. Numerical, = mean tobacco exposure time in children’s movies that show the use of tobacco (in minutes) ii. H0: = 2, Ha: < 2 iii. The mean tobacco exposure time is (at least) 2 minutes, but the data leads us to believe that it is less than 2 minutes. iv. The mean tobacco exposure time is less than 2 minutes, but the data leads us to believe that it is (at least) 2 minutes. 2. Lipitor i. Categorical (S/F = experience flu-like symptoms/don’t experience flu-like symptoms) = proportion of all people taking Lipitor who experience flu-like symptoms ii. H0: = 0.01, Ha: > 0.01 iii. (At most) 1% of all people taking Lipitor experience flu-like symptoms, but the data leads us to believe that more than 1% do. iv. More than 1% of all people taking Lipitor experience flu-like symptoms, but the data leads us to believe that (at most) 1% do. Chapter 10 Activities Worksheets Introduction to Statistics and Data Analysis Chapter 10 – Computing the P-Value Practice finding probabilities for z and t… SHOW YOUR WORK! You may use the z and t tables OR You may use your calculator e.g. P(z < –1.07) = normalcdf(–,–1.07, 0, 1) OR P(t with 14 df > 2.52) = tcdf(2.52, , 14) 1. Finding probability for z. A. P(z > 1.65) B. P(z < –0.94) C. P(z < –2.59 OR z > +2.59) 2. Finding probability for t with n = 10 How many degrees of freedom should you use for t? _______ A. P(t > 2.33) B. P(t < –1.50) C. P(t < –2.05 OR t > +2.05) 3. Finding probability for t with n = 20 How many degrees of freedom should you use for t? _______ A. P(t > 1.86) B. P(t < –2.45) C. P(t < –1.37 OR t > +1.37) Chapter 10 Activities Worksheets Television Viewers The demographics of television viewers are an important factor in selling advertising time. The RX pharmaceutical company would like to market a new acid-reflux medication to consumers under the age of 50. They are considering buying advertising time on the cable channel MSNBC, if they find evidence that the average age of MSNBC viewers is under 50 years. A. Determine the population type AND describe the population characteristic (in words). B. State H0 and Ha C. Is this a z or a t test statistic? D. Suppose that a random sample of 60 MSNBC viewers had a test statistic value of –1.83. Compute the p-value. E. Based on your p-value from part D, is data at least as inconsistent with H0 as our sample likely to occur when H0 is true? EXPLAIN. Chapter 10 Activities Worksheets Female Students At Rochester Institute of Technology, 34% of the students are female. The Department of Mathematics and Statistics would like to know if the Data Analysis course has a different percentage of female students. A. Determine the population type AND describe the population characteristic (in words). B. State H0 and Ha C. Is this a z or a t test statistic? D. A random sample of 36 Data Analysis students had a test statistic value of 0.97. What is the p-value? E. Based on your p-value from part D, is data at least as inconsistent with H0 as our sample likely to occur when H0 is true? EXPLAIN. Chapter 10 Activities Worksheets Solution to Computing the P-Value Probabilities 1. z ~ N(0, 1) A. P(z > 1.65) = normalcdf(1.65, , 0, 1) = 0.0495 B. P(z < –0.94) = normalcdf(–, –0.94, 0, 1) = 0.1736 C. P(z < –2.59 OR z > +2.59) = 1 – normalcdf(–2.59, 2.59, 0, 1) = 0.0096 2. t with df = 9 A. P(t > 2.33) = tcdf(2.33, , 9) = 0.0224 B. P(t < –1.50) = tcdf(–, –1.50, 9) = 0.0839 C. P(t < –2.05 OR t > +2.05) = 1 – tcdf(–2.05, 2.05, 9) = 0.0706 3. t with df = 19 A. P(t > 1.86) = tcdf(1.86, , 19) = 0.0392 B. P(t < –2.45) = tcdf(–, –2.45, 19) = 0.0121 C. P(t < –1.37 OR t > +1.37) = 1 – tcdf(–1.37, 1.37, 19) = 0.1867 Television Viewers A. Numerical, = mean age of all MSNBC viewers (in years) B. H0: = 50, Ha: < 50 C. t D. P(t with 59 df –1.83) = tcdf(–,–1.83, 59) = 0.0361 E. No, data at least as inconsistent with H0 as the sample in part D is not likely to occur when H0 is true (only a 3.61% chance) Female Students A. Categorical (S/F = female/male), = proportion of all Data Analysis students who are female B. H0: = 0.34, Ha: 0.34 C. z D. P(z –0.97 OR z 0.97) = 1 – normalcdf(–0.97, 0.97, 0, 1) = 0.3321 E. Yes, data at least as inconsistent with H0 as the sample in part D is fairly likely to occur when H0 is true (33.21% chance) Chapter 10 Activities Worksheets Introduction to Statistics and Data Analysis Chapter 10 – Testing Hypotheses About Left-Handedness Among the Elderly Research indicates that 10% of all people are left-handed. A study of 1650 people age 65 and older contained only 83 lefties (“British Survey of Left-Handedness”, N. Bradley, The Graphologist, 1992). Does this data provide evidence that the proportion of elderly people who are left-handed is smaller than the proportion in the general population? A. POPULATION Determine the population type Describe (in words) the population characteristic State H0 and Ha (using or ) B. STATISTICAL METHOD Set a reasonable level for Write the formula of the test statistic (using the hypothesized value from H0) C. SAMPLE Describe the sample: o For numerical data, determine n, X , and s. o For categorical data, determine n and p. Check that the sample meets the necessary assumptions. Chapter 10 Activities Worksheets D. STATISTICAL RESULTS Compute the value of the test statistic using the formula from part B Compute the p-value. E. CONCLUSION Reject H0 OR Fail to Reject H0 Make a concluding statement Chapter 10 Activities Worksheets Solution to Testing Hypotheses About A. Categorical (S/F = left-handed/right-handed) = proportion of left-handed people among those age 65 and older H0: = 0.10 Ha: < 0.10 B. = 0.05 z p 0.10 0.10 1 0.10 n C. n = 1650 and p = 83/1650 = 0.0503 Assume that this is a random sample of all people 65 and older n0 = 1650(0.1) = 165 AND n(1 – 0) = 1650(1 – 0.1) = 1485 n0 10 and n(1 – 0) 10 so n is large (and p is normal) The sample size is small compared to the population (all people age 65 and older) size. D. z 0.0503 0.10 6.73 0.10 1 0.10 1650 p-value = P(z < –6.73) = 8.59E-12 E. Is p-value ? YES, so we REJECT H0 The data provides sufficient evidence to conclude that the proportion of elderly people who are left-handed is smaller than the proportion in the general population. NOTE: Why are there fewer lefties among the elderly? Bradley (1992) says, “One interpretation of this, which has appeared in the popular press, could be that left-handers die earlier than right-handers… An alternative view, however, is that older people were at school during the period when children were often being forced into the right-handed mould, and lefthandedness was suppressed. The wisdom of this unnaturalness was being questioned at the time, and so not all children were subjected to it, but it took time for the more liberated view to prevail completely.” Chapter 10 Activities Worksheets Introduction to Statistics and Data Analysis Chapter 10 – 1PropZTest A recent article in Chance Magazine (L. Evans, 2006) states that, “For every age, all the way through the mid-90s, male [driving] fatalities are typically 3 to 5 times that of female fatalities.” In other words, at least 75% of driving fatalities are male. The data for the article (www.scienceservingsociety.com/Dr.xls) indicates that, in 2003, 414 male and 120 female 20year-old drivers were killed while traveling alone. Consider this data to be a random sample of all fatal crashes for 20-year-old drivers traveling alone. Does the data provide sufficient evidence to conclude that more than 75% of all fatal crashes for 20-year-old drivers traveling alone involve male drivers? A. POPULATION Determine the population type Describe (in words) the population characteristic State H0 and Ha (using or ) B. STATISTICAL METHOD Set a reasonable level for Write the formula of the test statistic (using the hypothesized value from H0) C. SAMPLE Describe the sample: o For numerical data, determine n, X , and s. o For categorical data, determine n and p. Check that the sample meets the necessary assumptions. Chapter 10 Activities Worksheets D. STATISTICAL RESULTS Compute the value of the test statistic and the p-value using the 1-PropZTest on your calculator o TI-83/84: Press STAT, choose TESTS (scroll down to find 1-PropZTest) o TI-89: In the Stats/List Editor, press F6 (Tests) and select 1-PropZTest E. CONCLUSION Reject H0 OR Fail to Reject H0 Make a concluding statement Chapter 10 Activities Worksheets Solution to 1-PropZTest A. Categorical (S/F = male driver/female driver) = proportion of male drivers among all fatal crashes for 20-year-old drivers traveling alone H0: = 0.75 Ha: > 0.75 B. = 0.05 z p 0.75 0.75 1 0.75 n C. n = 414 + 120 = 534 and p = 414/534 = 0.7753 Assume that this is a random sample of all such crashes n0 = 534(0.75) = 400.5 AND n(1 – 0) = 534(1 – 0.75) = 133.5 n0 10 and n(1 – 0) 10 so n is large (and p is normal) The sample size is small compared to the population (all fatal crashes for 20-year-old drivers traveling alone over time) size. D. 1-PropZTest z = 1.35, p-value = 0.0886 E. Is p-value ? NO, so we FAIL TO REJECT H0 The data does NOT provide sufficient evidence to conclude that the proportion of male drivers among all fatal crashes for 20-year-old drivers traveling alone is greater than 0.75. Chapter 10 Activities Worksheets Introduction to Statistics and Data Analysis Chapter 10 – Testing Hypotheses About A nutritionist claims that ready-to-eat breakfast cereal has about 100 calories per ounce, on average. A random sample of twelve ready-to-eat cereals provided the following nutritional information in the table below: Cereal Name Kellogg's Raisin Bran Kellogg's Cocoa Krispies Kellogg's Corn Flakes Post Honey Bunches of Oats Post Shredded Wheat Post Honey Comb Quaker Life Quaker Puffed Rice General Mills Cheerios General Mills Lucky Charms General Mills Wheaties General Mills Wheat Chex Calories Per Serving 190 120 100 130 170 120 120 50 110 110 100 160 Serving Size (grams) 59 31 28 32 49 32 32 14 30 27 27 47 Calories Per Ounce 91.14 Because 1 ounce = 28.3 grams, we can compute the calories per ounce as follows: Calories Per Serving 28.3 grams Calories Per Ounce Serving Size (grams) 1 ounce For example, Kellogg’s Raisin Bran has 91.14 calories per ounce 190 calories 28.3 grams Calories Per Ounce 91.14 59 grams 1 ounce 1. Compute the calories-per-ounce values for the remaining cereals in the sample. Write your results in the table above. 2. Determine if the sample provides sufficient evidence to contradict the nutritionist’s claim (using steps A – E below). A. POPULATION Determine the population type Describe (in words) the population characteristic State H0 and Ha (using or ) Chapter 10 Activities Worksheets B. STATISTICAL METHOD Set a reasonable level for Write the formula of the test statistic (using the hypothesized value from H0) C. SAMPLE Describe the sample: o For numerical data, determine n, X , and s. o For categorical data, determine n and p. Check that the sample meets the necessary assumptions. D. STATISTICAL RESULTS Compute the value of the test statistic using the formula from part B Compute the p-value. E. CONCLUSION Reject H0 OR Fail to Reject H0 Make a concluding statement Chapter 10 Activities Worksheets Solution to Testing Hypotheses About 1. Computing Calories Per Ounce Cereal Name Kellogg's Raisin Bran Kellogg's Cocoa Krispies Kellogg's Corn Flakes Post Honey Bunches of Oats Post Shredded Wheat Post Honey Comb Quaker Life Quaker Puffed Rice General Mills Cheerios General Mills Lucky Charms General Mills Wheaties General Mills Wheat Chex Calories Per Ounce 91.14 109.55 101.07 114.97 98.18 106.13 106.13 101.07 103.76 115.30 104.82 96.34 2. Hypothesis Test A. Numerical (calories per ounce) = mean calories per ounce for all ready-to-eat cereals H0: = 100 Ha: 100 B. = 0.05 X 100 t s n C. n = 12, X = 104.04, s = 7.16 This is a random sample of all ready-to-eat cereals. n is small (< 30), so check a normal probability plot of the sample Probability Plot of Calories Per Ounce Normal 99 Mean StDev N AD P-Value 95 90 Percent 80 70 104.0 7.158 12 0.184 0.886 60 50 40 30 20 10 5 1 90 95 100 105 110 Calories Per Ounce 115 120 125 Normal probability plot is straight line, so the distribution of the population is approximately normal (and X is normal). Chapter 10 Activities Worksheets D. t 104.04 100 1.95 7.16 12 p-value = P(t with 11 df > 1.95 OR < -1.95) = 0.077 E. Is p-value ? NO, so we FAIL TO REJECT H0 The data does NOT provide sufficient evidence to conclude that the mean number of calories per ounce for all ready-to-eat cereals differs from 100. (Not sufficient evidence to contradict the nutritionist’s claim.) Chapter 10 Activities Worksheets Introduction to Statistics and Data Analysis Chapter 10 – TTest Standard bracelet size is 7 inches for women and 8 inches for men (according to Reed’s Jewelers). Do these sizes accommodate the average wrist size? In other words, is the average wrist size of all adults less than 7 inches (17.8 cm)? Obtain a sample from the students in class: Using a metric tape measure, determine the size of each student’s wrist to the nearest 0.1 cm. Write the wrist sizes on the board. Does this data provide sufficient evidence to conclude that the average wrist size for all adults is less than 17.8 cm? A. POPULATION Determine the population type Describe (in words) the population characteristic State H0 and Ha (using or ) B. STATISTICAL METHOD Set a reasonable level for Write the formula of the test statistic (using the hypothesized value from H0) C. SAMPLE Describe the sample: o For numerical data, determine n, X , and s. o For categorical data, determine n and p. Check that the sample meets the necessary assumptions. Chapter 10 Activities Worksheets D. STATISTICAL RESULTS Compute the value of the test statistic and the p-value using the T-Test on your calculator o TI-83/84: Press STAT, choose TESTS, and select T-Test o TI-89: In the Stats/List Editor, press F6 (Tests) and select T-Test E. CONCLUSION Reject H0 OR Fail to Reject H0 Make a concluding statement Chapter 10 Activities Worksheets Solution to T-Test An example…. Students in an introductory statistics class measured their wrists to the nearest 0.1 cm. The 34 students had an average wrist size of 16.56 cm, with a standard deviation of 1.30 cm. Does the data provide sufficient evidence to conclude that the average wrist size for all adults is less than 17.8 cm? A. Numerical (wrist size) = mean wrist size of all adults (in cm) H0: = 17.8 Ha: < 17.8 B. = 0.05 X 17.8 t s n C. n = 34, X = 16.56, s = 1.30 Assume that this is a random sample of all adults n 30, so n is large (and X is approximately normal) D. T-Test t = -5.56, p-value = 0.000002 E. Is p-value ? YES, so we REJECT H0 The data provides sufficient evidence to conclude that the mean wrist size of all adults is less than 17.8 cm (7 inches). Chapter 10 Activities Worksheets