Download Worksheet_ch10 - Germantown School District

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Statistical inference wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Introduction to Statistics and Data Analysis
Chapter 10 – Hypothesis Test Basics
Criminal Trials in the United States
The jury is always told that the defendant is “innocent until proven guilty”.
1. What must a member of the jury assume about the defendant at the beginning of the trial?
This is the null hypothesis.
H0: _______________________ (HINT: one word)
2. It is the prosecuting attorney’s job to present evidence to the jury. IF there is enough
evidence (“beyond a reasonable doubt”), then the jury will convict the defendant of the crime.
If the defendant is convicted, the jury is rejecting the null hypothesis (above) and saying the the
defendant is _________. This is the alternative hypothesis.
Ha: _______________________
3. When the jury convicts someone of a crime, their verdict is GUILTY.
Is this “Reject H0” OR “Fail to Reject H0”?
4. If the jury fails to convict someone of a crime, their verdict is NOT GUILTY.
Is this “Reject H0” OR “Fail to Reject H0”?
How does the verdict of “not guilty” differ from “innocent”?
5. Sometimes the jury makes a correct decision and sometime the jury makes a mistake.
a. When H0 is true, but we reject it based on the sample evidence, this is an error. We call it a
Type I error. Write a sentence describing a Type I error in the U.S. criminal justice system.
b. When H0 is false, but we fail to reject it based on the sample evidence, this is also an error.
We call it a Type II error. Write a sentence describing a Type II error in the U.S. criminal justice
system.
Chapter 10
Activities Worksheets
Put each of the following in the correct place in the table below…
Type I Error Type II Error Correct Decision Correct Decision
Decision Based on Evidence (Data)
Reject H0
Fail to Reject H0
H0 is true
TRUTH
(Unknown)
H0 is false
(and Ha is true)
Medical Testing
Medical tests have been developed to detect many serious diseases (such as cancer and HIV). A
medical test is designed to give correct results as often as possible. That is, to minimize the
occurrence of “false positives” and “false negatives”.
A doctor starts by assuming that a patient is healthy (no disease), then looks for evidence to
contradict that assumption. If the patient has a negative test result, the doctor continues to
assume that the patient is healthy. If the patient has a positive test result, the doctor concludes
that the patient has a disease.
A. State H0 and Ha.
B. When will the doctor Reject H0?
C. When will the doctor Fail to Reject H0?
D. What kind of an error is a “false positive”? EXPLAIN.
E. What kind of an error is a “false negative”? EXPLAIN.
F. What are the consequences of a false positive? Of a false negative?
Chapter 10
Activities Worksheets
Solution to Hypothesis Test Basics
Criminal Trials in the United States
1. H0: Defendant Is Innocent
2. Ha: Defendant Is Guilty
3. Conviction = Guilty Verdict = “Reject H0”
4. Failure to Convict = NOT Guilty Verdict = “Fail to Reject H0”
“Not Guilty” indicates that there was not enough evidence to convict the defendant. This
verdict makes no statement about whether or not the defendant committed the crime.
“Innocent” indicates that the defendant did not commit the crime.
5. Type I Error = Guilty Verdict When Defendant Is Innocent
6. Type II Error = NOT Guilty Verdict When Defendant is Guilty
Decision Based on Evidence (Data)
Reject H0
Fail to Reject H0
TRUTH
(Unknown)
H0 is true
H0 is false
(and Ha is true)
Type I Error
Correct Decision
Correct Decision
Type II Error
Medical Testing
A. H0: Patient Is Healthy Ha: Patient Has Disease
B. Doctor rejects H0 when test result is positive.
C. Doctor fails to reject H0 when test result is negative.
D. A false positive is a Type I error (reject H0 when H0 is true).
E. A false negative is a Type II error (fail to reject H0 when H0 is false).
F. With a false positive, a person thinks they have a disease (and may start treatment) when they
are healthy. With a false negative, a person doesn’t know they have a disease (and don’t start
treatment).
Chapter 10
Activities Worksheets
Introduction to Statistics and Data Analysis
Chapter 10 – Type I and Type II Errors
In the Movies…
A movie critic claims that, among children’s movies that show the use of tobacco, the mean
exposure time is less than 2 minutes.
i. Identify the population type and describe the population characteristic (in words).
ii. State H0 and Ha
iii. Describe a Type I error IN CONTEXT.
iv. Describe a Type II error IN CONTEXT.
Chapter 10
Activities Worksheets
Potential Side Effects…
A researcher claims that over 1% of the people who take the drug Lipitor experience flu-like
symptoms.
i. Identify the population type and describe the population characteristic (in words).
ii. State H0 and Ha
iii. Describe a Type I error IN CONTEXT.
iv. Describe a Type II error IN CONTEXT.
Chapter 10
Activities Worksheets
Solution To Type I and Type II Errors
1. Tobacco Exposure
i. Numerical,  = mean tobacco exposure time in children’s movies that show the use of
tobacco (in minutes)
ii. H0:  = 2, Ha:  < 2
iii. The mean tobacco exposure time is (at least) 2 minutes, but the data leads us to believe
that it is less than 2 minutes.
iv. The mean tobacco exposure time is less than 2 minutes, but the data leads us to believe
that it is (at least) 2 minutes.
2. Lipitor
i. Categorical (S/F = experience flu-like symptoms/don’t experience flu-like symptoms)
 = proportion of all people taking Lipitor who experience flu-like symptoms
ii. H0:  = 0.01, Ha:  > 0.01
iii. (At most) 1% of all people taking Lipitor experience flu-like symptoms, but the data leads
us to believe that more than 1% do.
iv. More than 1% of all people taking Lipitor experience flu-like symptoms, but the data
leads us to believe that (at most) 1% do.
Chapter 10
Activities Worksheets
Introduction to Statistics and Data Analysis
Chapter 10 – Computing the P-Value
Practice finding probabilities for z and t… SHOW YOUR WORK!
 You may use the z and t tables OR
 You may use your calculator
e.g. P(z < –1.07) = normalcdf(–,–1.07, 0, 1)
OR P(t with 14 df > 2.52) = tcdf(2.52, , 14)
1. Finding probability for z.
A. P(z > 1.65)
B. P(z < –0.94)
C. P(z < –2.59 OR z > +2.59)
2. Finding probability for t with n = 10
How many degrees of freedom should you use for t? _______
A. P(t > 2.33)
B. P(t < –1.50)
C. P(t < –2.05 OR t > +2.05)
3. Finding probability for t with n = 20
How many degrees of freedom should you use for t? _______
A. P(t > 1.86)
B. P(t < –2.45)
C. P(t < –1.37 OR t > +1.37)
Chapter 10
Activities Worksheets
Television Viewers
The demographics of television viewers are an important factor in selling advertising time. The
RX pharmaceutical company would like to market a new acid-reflux medication to consumers
under the age of 50. They are considering buying advertising time on the cable channel
MSNBC, if they find evidence that the average age of MSNBC viewers is under 50 years.
A. Determine the population type AND describe the population characteristic (in words).
B. State H0 and Ha
C. Is this a z or a t test statistic?
D. Suppose that a random sample of 60 MSNBC viewers had a test statistic value of –1.83.
Compute the p-value.
E. Based on your p-value from part D, is data at least as inconsistent with H0 as our sample
likely to occur when H0 is true? EXPLAIN.
Chapter 10
Activities Worksheets
Female Students
At Rochester Institute of Technology, 34% of the students are female. The Department of
Mathematics and Statistics would like to know if the Data Analysis course has a different
percentage of female students.
A. Determine the population type AND describe the population characteristic (in words).
B. State H0 and Ha
C. Is this a z or a t test statistic?
D. A random sample of 36 Data Analysis students had a test statistic value of 0.97. What is the
p-value?
E. Based on your p-value from part D, is data at least as inconsistent with H0 as our sample
likely to occur when H0 is true? EXPLAIN.
Chapter 10
Activities Worksheets
Solution to Computing the P-Value
Probabilities
1. z ~ N(0, 1)
A. P(z > 1.65) = normalcdf(1.65, , 0, 1) = 0.0495
B. P(z < –0.94) = normalcdf(–, –0.94, 0, 1) = 0.1736
C. P(z < –2.59 OR z > +2.59) = 1 – normalcdf(–2.59, 2.59, 0, 1) = 0.0096
2. t with df = 9
A. P(t > 2.33) = tcdf(2.33, , 9) = 0.0224
B. P(t < –1.50) = tcdf(–, –1.50, 9) = 0.0839
C. P(t < –2.05 OR t > +2.05) = 1 – tcdf(–2.05, 2.05, 9) = 0.0706
3. t with df = 19
A. P(t > 1.86) = tcdf(1.86, , 19) = 0.0392
B. P(t < –2.45) = tcdf(–, –2.45, 19) = 0.0121
C. P(t < –1.37 OR t > +1.37) = 1 – tcdf(–1.37, 1.37, 19) = 0.1867
Television Viewers
A. Numerical,  = mean age of all MSNBC viewers (in years)
B. H0:  = 50, Ha:  < 50
C. t
D. P(t with 59 df  –1.83) = tcdf(–,–1.83, 59) = 0.0361
E. No, data at least as inconsistent with H0 as the sample in part D is not likely to occur when H0
is true (only a 3.61% chance)
Female Students
A. Categorical (S/F = female/male),  = proportion of all Data Analysis students who are female
B. H0:  = 0.34, Ha:   0.34
C. z
D. P(z  –0.97 OR z  0.97) = 1 – normalcdf(–0.97, 0.97, 0, 1) = 0.3321
E. Yes, data at least as inconsistent with H0 as the sample in part D is fairly likely to occur when
H0 is true (33.21% chance)
Chapter 10
Activities Worksheets
Introduction to Statistics and Data Analysis
Chapter 10 – Testing Hypotheses About 
Left-Handedness Among the Elderly
Research indicates that 10% of all people are left-handed. A study of 1650 people age 65 and
older contained only 83 lefties (“British Survey of Left-Handedness”, N. Bradley, The
Graphologist, 1992). Does this data provide evidence that the proportion of elderly people who
are left-handed is smaller than the proportion in the general population?
A. POPULATION
 Determine the population type
 Describe (in words) the population characteristic
 State H0 and Ha (using  or )
B. STATISTICAL METHOD
 Set a reasonable level for 
 Write the formula of the test statistic (using the hypothesized value from H0)
C. SAMPLE
 Describe the sample:
o For numerical data, determine n, X , and s.
o For categorical data, determine n and p.
 Check that the sample meets the necessary assumptions.
Chapter 10
Activities Worksheets
D. STATISTICAL RESULTS
 Compute the value of the test statistic using the formula from part B
 Compute the p-value.
E. CONCLUSION
 Reject H0 OR Fail to Reject H0
 Make a concluding statement
Chapter 10
Activities Worksheets
Solution to Testing Hypotheses About 
A. Categorical (S/F = left-handed/right-handed)
 = proportion of left-handed people among those age 65 and older
H0:  = 0.10
Ha:  < 0.10
B.  = 0.05
z
p  0.10
0.10 1  0.10 
n
C. n = 1650 and p = 83/1650 = 0.0503
Assume that this is a random sample of all people 65 and older
n0 = 1650(0.1) = 165 AND n(1 – 0) = 1650(1 – 0.1) = 1485
n0 10 and n(1 – 0) 10 so n is large (and p is normal)
The sample size is small compared to the population (all people age 65 and older) size.
D. z 
0.0503  0.10
 6.73
0.10 1  0.10 
1650
p-value = P(z < –6.73) = 8.59E-12
E. Is p-value  ? YES, so we REJECT H0
The data provides sufficient evidence to conclude that the proportion of elderly people
who are left-handed is smaller than the proportion in the general population.
NOTE: Why are there fewer lefties among the elderly? Bradley (1992) says, “One
interpretation of this, which has appeared in the popular press, could be that left-handers die
earlier than right-handers… An alternative view, however, is that older people were at school
during the period when children were often being forced into the right-handed mould, and lefthandedness was suppressed. The wisdom of this unnaturalness was being questioned at the time,
and so not all children were subjected to it, but it took time for the more liberated view to prevail
completely.”
Chapter 10
Activities Worksheets
Introduction to Statistics and Data Analysis
Chapter 10 – 1PropZTest
A recent article in Chance Magazine (L. Evans, 2006) states that, “For every age, all the way
through the mid-90s, male [driving] fatalities are typically 3 to 5 times that of female fatalities.”
In other words, at least 75% of driving fatalities are male. The data for the article
(www.scienceservingsociety.com/Dr.xls) indicates that, in 2003, 414 male and 120 female 20year-old drivers were killed while traveling alone.
Consider this data to be a random sample of all fatal crashes for 20-year-old drivers traveling
alone. Does the data provide sufficient evidence to conclude that more than 75% of all fatal
crashes for 20-year-old drivers traveling alone involve male drivers?
A. POPULATION
 Determine the population type
 Describe (in words) the population characteristic
 State H0 and Ha (using  or )
B. STATISTICAL METHOD
 Set a reasonable level for 
 Write the formula of the test statistic (using the hypothesized value from H0)
C. SAMPLE
 Describe the sample:
o For numerical data, determine n, X , and s.
o For categorical data, determine n and p.
 Check that the sample meets the necessary assumptions.
Chapter 10
Activities Worksheets
D. STATISTICAL RESULTS
 Compute the value of the test statistic and the p-value using the 1-PropZTest on your
calculator
o TI-83/84: Press STAT, choose TESTS (scroll down to find 1-PropZTest)
o TI-89: In the Stats/List Editor, press F6 (Tests) and select 1-PropZTest
E. CONCLUSION
 Reject H0 OR Fail to Reject H0
 Make a concluding statement
Chapter 10
Activities Worksheets
Solution to 1-PropZTest
A. Categorical (S/F = male driver/female driver)
 = proportion of male drivers among all fatal crashes for 20-year-old drivers traveling
alone
H0:  = 0.75
Ha:  > 0.75
B.  = 0.05
z
p  0.75
0.75 1  0.75 
n
C. n = 414 + 120 = 534 and p = 414/534 = 0.7753
Assume that this is a random sample of all such crashes
n0 = 534(0.75) = 400.5 AND n(1 – 0) = 534(1 – 0.75) = 133.5
n0 10 and n(1 – 0) 10 so n is large (and p is normal)
The sample size is small compared to the population (all fatal crashes for 20-year-old
drivers traveling alone over time) size.
D. 1-PropZTest  z = 1.35, p-value = 0.0886
E. Is p-value  ? NO, so we FAIL TO REJECT H0
The data does NOT provide sufficient evidence to conclude that the proportion of male
drivers among all fatal crashes for 20-year-old drivers traveling alone is greater than 0.75.
Chapter 10
Activities Worksheets
Introduction to Statistics and Data Analysis
Chapter 10 – Testing Hypotheses About 
A nutritionist claims that ready-to-eat breakfast cereal has about 100 calories per ounce, on
average. A random sample of twelve ready-to-eat cereals provided the following nutritional
information in the table below:
Cereal Name
Kellogg's Raisin Bran
Kellogg's Cocoa Krispies
Kellogg's Corn Flakes
Post Honey Bunches of Oats
Post Shredded Wheat
Post Honey Comb
Quaker Life
Quaker Puffed Rice
General Mills Cheerios
General Mills Lucky Charms
General Mills Wheaties
General Mills Wheat Chex
Calories
Per Serving
190
120
100
130
170
120
120
50
110
110
100
160
Serving Size
(grams)
59
31
28
32
49
32
32
14
30
27
27
47

Calories
Per Ounce
91.14
Because 1 ounce = 28.3 grams, we can compute the calories per ounce as follows:
Calories Per Serving 28.3 grams
Calories Per Ounce 

Serving Size (grams)
1 ounce
For example, Kellogg’s Raisin Bran has 91.14 calories per ounce
190 calories 28.3 grams
Calories Per Ounce 

 91.14
59 grams
1 ounce
1. Compute the calories-per-ounce values for the remaining cereals in the sample. Write your
results in the table above.
2. Determine if the sample provides sufficient evidence to contradict the nutritionist’s claim
(using steps A – E below).
A. POPULATION
 Determine the population type
 Describe (in words) the population characteristic
 State H0 and Ha (using  or )
Chapter 10
Activities Worksheets
B. STATISTICAL METHOD
 Set a reasonable level for 
 Write the formula of the test statistic (using the hypothesized value from H0)
C. SAMPLE
 Describe the sample:
o For numerical data, determine n, X , and s.
o For categorical data, determine n and p.
 Check that the sample meets the necessary assumptions.
D. STATISTICAL RESULTS
 Compute the value of the test statistic using the formula from part B
 Compute the p-value.
E. CONCLUSION
 Reject H0 OR Fail to Reject H0
 Make a concluding statement
Chapter 10
Activities Worksheets
Solution to Testing Hypotheses About 
1. Computing Calories Per Ounce
Cereal Name
Kellogg's Raisin Bran
Kellogg's Cocoa Krispies
Kellogg's Corn Flakes
Post Honey Bunches of Oats
Post Shredded Wheat
Post Honey Comb
Quaker Life
Quaker Puffed Rice
General Mills Cheerios
General Mills Lucky Charms
General Mills Wheaties
General Mills Wheat Chex
Calories Per Ounce
91.14
109.55
101.07
114.97
98.18
106.13
106.13
101.07
103.76
115.30
104.82
96.34
2. Hypothesis Test
A. Numerical (calories per ounce)
 = mean calories per ounce for all ready-to-eat cereals
H0:  = 100
Ha:   100
B.  = 0.05
X  100
t
 s 


 n
C. n = 12, X = 104.04, s = 7.16
This is a random sample of all ready-to-eat cereals.
n is small (< 30), so check a normal probability plot of the sample
Probability Plot of Calories Per Ounce
Normal
99
Mean
StDev
N
AD
P-Value
95
90
Percent
80
70
104.0
7.158
12
0.184
0.886
60
50
40
30
20
10
5
1
90
95
100
105
110
Calories Per Ounce
115
120
125
Normal probability plot is  straight line, so the distribution of the
population is approximately normal (and X is normal).
Chapter 10
Activities Worksheets
D. t 
104.04  100   1.95
 7.16 


 12 
p-value = P(t with 11 df > 1.95 OR < -1.95) = 0.077
E. Is p-value  ? NO, so we FAIL TO REJECT H0
The data does NOT provide sufficient evidence to conclude that the mean number
of calories per ounce for all ready-to-eat cereals differs from 100. (Not sufficient
evidence to contradict the nutritionist’s claim.)
Chapter 10
Activities Worksheets
Introduction to Statistics and Data Analysis
Chapter 10 – TTest
Standard bracelet size is 7 inches for women and 8 inches for men (according to Reed’s
Jewelers). Do these sizes accommodate the average wrist size? In other words, is the average
wrist size of all adults less than 7 inches (17.8 cm)?
Obtain a sample from the students in class:
 Using a metric tape measure, determine the size of each student’s wrist to the nearest 0.1 cm.
 Write the wrist sizes on the board.
 Does this data provide sufficient evidence to conclude that the average wrist size for all
adults is less than 17.8 cm?
A. POPULATION
 Determine the population type
 Describe (in words) the population characteristic
 State H0 and Ha (using  or )
B. STATISTICAL METHOD
 Set a reasonable level for 
 Write the formula of the test statistic (using the hypothesized value from H0)
C. SAMPLE
 Describe the sample:
o For numerical data, determine n, X , and s.
o For categorical data, determine n and p.
 Check that the sample meets the necessary assumptions.
Chapter 10
Activities Worksheets
D. STATISTICAL RESULTS
 Compute the value of the test statistic and the p-value using the T-Test on your
calculator
o TI-83/84: Press STAT, choose TESTS, and select T-Test
o TI-89: In the Stats/List Editor, press F6 (Tests) and select T-Test
E. CONCLUSION
 Reject H0 OR Fail to Reject H0
 Make a concluding statement
Chapter 10
Activities Worksheets
Solution to T-Test
An example….
Students in an introductory statistics class measured their wrists to the nearest 0.1 cm. The 34
students had an average wrist size of 16.56 cm, with a standard deviation of 1.30 cm. Does the
data provide sufficient evidence to conclude that the average wrist size for all adults is less than
17.8 cm?
A. Numerical (wrist size)
 = mean wrist size of all adults (in cm)
H0:  = 17.8
Ha:  < 17.8
B.  = 0.05
X  17.8
t
 s 


 n
C. n = 34, X = 16.56, s = 1.30
Assume that this is a random sample of all adults
n  30, so n is large (and X is approximately normal)
D. T-Test  t = -5.56, p-value = 0.000002
E. Is p-value  ? YES, so we REJECT H0
The data provides sufficient evidence to conclude that the mean wrist size of all adults is
less than 17.8 cm (7 inches).
Chapter 10
Activities Worksheets