Download AP Review Inference - Hypotheses Test Key

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Law of large numbers wikipedia , lookup

Gibbs sampling wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
AP Statistics
Vocabulary
Significance test
Alternative hypotheses
statistically significant
type II error
sampling distribution
chi-square goodness of fit
observed count
AP Review Inference – Hypotheses Tests
Null hypothesis
p-value
type I error
power
standard error
chi-square distribution
expected count
Notes
A significance test assesses the evidence provided by data against a null hypothesis and in favor of an
alternative hypothesis
The hypotheses are usually stated in terms of population parameters. Often, H0 is a statement of no change
or no difference. The alternative hypothesis states what we hope or suspect is true.
A one-sided alternative says that a parameter differs from the null hypothesis value in a specific direction. A
two-sided alternative says that a parameter differs from the null value in either direction.
The reasoning of a significance test is as follows. Suppose that the null hypothesis is true. If we repeated our
data production many times, would we often get data as inconsistent with H0 in the direction specified by
Ha as the data we actually have? If the data are unlikely when H0 is true, they provide evidence against
H0 and in favor of Ha .
The p-value of a test is the probability, computed supposing H0 to be true, that the statistic will take a value
at least as extreme as the observed result in the direction specified by Ha .
Small p-values indicate strong evidence against H0 . To calculate a p-value, we must know the sampling
distribution of the test statistic when H0 is true.
If the p-value is smaller than a specified value α (called the significance level), the data are statistically
significant at level α. In that case, we can reject H0 and say that we have convincing evidence for Ha . If
the p-value is greater than or equal to α, we fail to reject H0 and say that we do not have convincing
evidence for Ha .
A Type I error occurs if we reject H0 when it is in fact true. In other words, the data give convincing evidence
for Ha when the null hypothesis is correct
A Type II error occurs if we fail to reject H0 when Ha true. In other words, the data don’t give convincing
evidence for Ha even though the alternative hypothesis is correct.
In a fixed level α significance test, the probability of a Type I error is the significance level α.
Follow a four-step process when asked to carry out a hypothesis test
Hypotheses: Write a null and alternative hypothesis in terms of a parameter (population) and in context
Conditions: Usually Random, Normal and Independent be sure to show how you checked each
Calculations: Write either the name or formula of the test, test statistic and p-value. (df when
appropriate)
Conclusion: Compare your p-value and α, write whether or not you reject H0 , and state your conclusion in
the context of the problem
Confidence intervals provide additional information that significance tests do not – namely, a set of plausible
values for the true population parameter.
The power of a significance test against a specific alternative is the probability that the test will reject H0
when the alternative is true. Power measures the ability of the test to detect an alternative value of the
parameter. For a specific alternative, P(Type II error) = 1 – power
We can increase the power of a significance test by increasing the sample size, increasing the significance
level, or increasing the difference that is important to detect between the null and alternative parameter
values.
Analyze paired data by first taking the difference within each pair to produce a single sample. Then use onesample t procedures.
Don’t use t sample t procedures to compare means for paired data.
Very small differences can be highly significant (small p-value) when a test is based on a large sample. A
statistically significant difference need not be practically important.
The chi-square goodness of fit tests the null hypothesis that a categorical variable has a specified distribution
in the population of interest.
If a chi-square test finds a statistically significant result, consider doing a follow-up analysis that compares the
observed and expected counts and that looks for the largest components of the chi-square statistic.
Some studies aim to compare the distribution of a single categorical variable for each of several populations or
treatments. In such cases, researchers should take independent random samples from the populations of
interest or sue the groups in a randomized experiment. The null hypothesis is that there is no difference in
the distribution of the categorical variable for each of the populations or treatments. We use the chisquare test for homogeneity to test this hypothesis.
Other studies are designed to investigate the relationship between two categorical variables. In such cases,
researchers take a random sample from the population of interest and classify each individual based on
the two categorical variables. The chi-square test for independence tests the null hypothesis that there is
no association between the two categorical variables in the population of interest.
Least squares regression fits a straight line of the form ŷ  a  bx to data to predict a response variable y from
an explanatory variable x. Inference the this setting uses the sample regression line to estimate or test a
claim about the population regression line
Confidence intervals and significance tests for the slope of the population regression line are based on a t
distribution with n – 2 degrees of freedom.
Curved relationships between two quantitative variables can sometimes be changed into linear relationships
by transforming one or both of the variables. Once we transform the data to achieve linearity, we can fit a
least-squares regression line to the transformed data and use this linear model to make predictions.
Problems
1. Suppose you suspect a “chute” of playing cards is not fair. The chute supposedly contains 10 standard
decks shuffled together. You are interested in knowing whether there are more hearts than usual. To test
this, you deal 12 cards at random and calculate the proportion of hearts in your hand.
A. Describe the parameter of interest in this setting.
The proportion of hearts in the chute
B. Write the appropriate null and alternative hypotheses for this situation.
H0 : p  0.25 the proportion of hearts in the chute is 0.25
Ha : p  0.25 the proportion of hearts in the chute is more than 0.25
C. Suppose your deal contains 7 hearts. What is the sample proportion of hearts? Is it possible to deal 7
hearts out of 12 cards if the chute really does contain standard decks? Is it likely? Why or why not?
It is possible that you would get 7 out of 12 cards hearts making your proportion 0.5833. If in fact one
fourth of the cards are hearts, the probability of getting a sample of 7 out of 12 red cards is


7


 0.25
7 

12
  0.0038 . Since we would get 7 or more hearts out of a sample of
P  pˆ    P  z 

12 

0.25 1  0.25  


12


12 cards less than 1% of the time by chance, it is very unlikely.
D. Describe a Type I error in this setting
We conclude there are more than 50% red cards in the chute when in fact there are 50%
E. Describe a Type II error in this setting
We conclude there are 50% red cards in the chute when in fact there are more than 50%
2. Humerus bones from the same species of animal tend to have approximately the same length-to-width
ratios. When fossils of humerus bones are discovered, archeologists can often determine the species of
animal by examining these ratios. It is known that the species Molekius Primatium exhibits a mean ratio of
µ = 8.9. Suppose 41 fossils of humerus bones are unearthed at a site on Minnesota’s Iron Range, where
this species was known to have lived. Researchers are willing to view these as a random sample of all such
humerus bones. The length-to-width ratios were calculated and are listed below. Test whether the
population mean for the species that left these bones differs from 8.9 at α = 0.05.
9.73
9.17
6.66
8.87
9.2
9.93
9.84
10.48
8.71
10.89
12
9.35
6.23
9.33
8.91
9.59
10.39
9.57
9.07
9.38
8.86
9.41
9.98
11.77
9.48
9.39
9.29
9.17
9.94
8.39
6.85
8.38
9.89
8.07
8.8
8.52
8.17
8.37
10.02
8.3
11.67
Hypotheses:
H0 :   8.9 the population mean for the species that left these bones is 8.9
Ha :   8.9 the population mean for the species that left these bones differs from 8.9
Conditions:
Random: stated
Independent: it is reasonable to believe there are more than 41(10) = 410 humerus bones.
Normal: Since we have a large sample size (41 > 30), the sampling distribution is approximately
normal.
Calculations:
1 – sample t test
x   9.27  8.9
t

 1.97
s
1.199
n
41
df = 41 – 1 = 40
p  0.0558
Conclusion:
Since p    0.0558  0.05 , I fail to reject H0 . There is insufficient evidence to conclude the
population mean for the species that left these bones differs from 8.9.
3. A recent study tracked the television viewing habits of 100 randomly selected first-grade boys and 200
randomly selected first-grade girls. Each child was asked to identify their favorite TV show. The following
table summarizes the results:
Zooboomafoo
iCarly
Phineas and Ferb
Boys
20 30
30 36.67
50 33.33
Girls
70 60
80 73.33
50 66.67
Do these data provide convincing evidence that television preferences differ significantly for boys and
girls?
Hypotheses:
H0 : Television preferences are the same for boys and girls
Ha : Television preferences are different for boys and girls
Conditions:
Random: stated
Independent: it is reasonable to believe there are more than 100(10) = 1000 first grade boys and more
than 200(10) = 200 first grade girls.
All expected counts > 5 (see chart).
Calculations:
 2 Test
 
2
 observed
p  0.000064
 expected
expected
2
20  30 

2
 30  36.67

30
36.67
df = (3 – 1)(2 – 1) = 2
2
 ...  19.318
Conclusion:
Since p    0.000064  0.05 , I reject H0 . There is sufficient evidence to conclude television
preferences are different for boys and girls. A follow-up analysis suggests the biggest difference
occurs in the preferences for Phineas and Ferb, with more boys and fewer girls preferring that
show than expected
4. A study of classic authors uncovered a distinguishable speech pattern that differed from author to author.
Plato utilized this pattern in 21.4% of the passages in his works. The owner of a rare bookstore claims to
have an original Plato work, but you suspect the speech pattern occurs too frequently to be an original
Plato work. A random sample of passages from the work in question was taken and it was found that 136
of the 439 selected passages followed the speech pattern. Do these data provide convincing evidence that
the work was not written by Plato?
Hypotheses:
H0 : p  0.214 the speech pattern occurs 21.4% in this work: the work is written by Plato
Ha : p  0.214 the speech pattern occurs more than 21.4% in this work; the work is not written by
Plato
Conditions:
Random: stated
Independent: it is reasonable to believe there are more than 439(10) = 4390 passages in this work.
 136 
 136 
Normal: Since 439 
  136  10 and 439  1 
  303  10 , the sample is large enough to
439 
 439 

conclude the sampling distribution is approximately normal.
Calculations:
1 – proportion z test
136
 0.214
pˆ  p
439
z

 4.894
p 1  p 
0.214 1  0.214 
n
439
p  4.95  107
Conclusion:
Since p < any reasonable α, I reject H0 . There is sufficient evidence to conclude the speech pattern
occurs more than 21.4% in this work; the work is not written by Plato.
5. A recent study suggested that 77% of teenagers have texted while driving. A random sample of 27
teenage drivers in Atlanta was taken and 15 admitted to texting while driving. Use a 99% confidence
interval to determine whether there is convincing evidence that the population proportion of teens who
text while driving is different than 77%.
Conditions:
Random: stated
Independent: it is reasonable to believe there are more than 27(10) = 270 teenagers who drive.
 15 
 15 
Normal: Since 27    15  10 and 27  1    12  10 , the sample is large enough to conclude the
 27 
 27 
sampling distribution is approximately normal.
Calculations:
1 – proportion z interval
pˆ  z *
15  15 
1  
15
27  27 
 2.576
27
27
pˆ 1  pˆ 
n
(0.309, 0.802)
Conclusion:
I am 99 % confident the true proportion of teenagers who text while driving is between 0.309 and
0.802. Since 77% is within the interval, there is insufficient evidence to conclude the true
proportion of teenagers who text while driving is different than 77%.
6. Researchers are interested in studying the effect of sleep on exam performance. Suppose the population
of individuals who get at least 8 hours of sleep prior to an exam score an average of 96 points on the exam
with a standard deviation of 18 points. The population of individuals who get less than 8 hours of sleep
score an average of 72 points with a standard deviation of 9.4 points. Suppose 40 individuals are randomly
sampled from each population.
A. Describe the shape, center, and spread of the sampling distribution of x1  x2
 x  x   x   x  96  72  24
1
2
 x x 
1
2
1
sx21
nx1
2

sx22
nx2

182 9.42

 3.21
40
40
Since the sample size is large (40 > 30) for each group, the sampling distribution of x1  x2 is
approximately normal.
B. Find the probability of observing a difference in sample means of 2 points or more from the mean
difference of the two samples. Show your work.
P  x1  x2  22 or x1  x2  26   P  x1  x2  22   P  x1  x2  26  
 22  24   26  24 
 Pt 
  Pt 
  0.268  0.268  0.537
3.21  
3.21 

7. A study measured how fast subjects could repeatedly push a button when under the effects of caffeine.
Subjects were asked to push a button as many times as possible in two minutes after consuming a typical
amount of caffeine. During another test session, they were asked to push the button after taking a
placebo. The subjects did not know which treatment they were administered each day and the order of
the treatments was randomly assigned. The data, given in presses per two minutes for each treatment
follows. Determine whether or not caffeine results in a higher rate of beats, on average, per two-minute
period.
Subject
1
2
3
4
5
6
7
8
9
10
11
Beats caffeine 281 284 300 421 240 294 377 345 303 340 408
Beats placebo 201 262 283 290 259 291 354 346 283 391 411
Differences
80
22
17
131 -19
3
23
-1
20
-51
-3
Hypotheses:
H0 : d  0 the population mean difference between the number of beats with or without caffeine is
0. There is no difference between the number of beats with or without caffeine.
Ha : d  0 the population mean difference between the number of beats with or without caffeine is
greater than 0. Caffeine results in a higher rate of beats, on average.
Conditions:
Random: “randomly assigned” stated
Independent: it is reasonable to believe there are more
than 11(10) = 110 possible subjects.
Normal: Since we don’t have a large sample size (11 < 30)
and a graph of the differences shows outliers and
skewness, it is not safe to assume the sampling
distribution is approximately normal. This condition is not met.
Calculations:
1 – sample t test
x   20.18  0
t

 1.373
s
48.75
n
11
df = 11 – 1 = 10
p  0.0999
Conclusion:
Since p    0.0999  0.05 , I fail to reject H0 . There is insufficient evidence to conclude the
population mean difference between the number of beats with or without caffeine is greater than 0.
We cannot conclude that caffeine results in a higher rate of beats, on average
8. School officials are interested in implementing a policy that would allow students to bring their own
technology to school for academic use. There are two large high schools in a town, Lakeville North and
Lakeville South, each with 1700 students. At Lakeville North, 60% of students own technological devices
that could be used for academic purposes. 75% of students at Lakeville South own those types of devices.
The district takes an SRS of 125 students from Lakeville North and a separate SRS of 160 students at
Lakeville South. The sample proportions of students who own devices that could be used at the school are
recorded and the difference pˆS  pˆN is determined to be 0.07.
A. Describe the shape, center, and spread of the sampling distribution of pˆS  pˆN .
pˆ  pˆ  pS  pN  0.75  0.60  0.15
S
N
 pˆ  pˆ 
S
N
pS 1  pS 
nS

pN 1  pN 
nN

0.75 1  0.75
160

0.6 1  0.6 
125
 0.0556
Since the sample size is large (160(0.75) = 120 > 10, 160(1 – 0.75) = 40 > 10, 125(0.60) = 75 > 10, and
125(1 – 0.60) = 50 > 10), the sampling distribution of pˆS  pˆN is approximately normal.
B. Find the probability of getting a difference in sample proportions of 0.07 or less from the two surveys.
Show your work.
0.07  0.15 

P  pˆS  pˆN  0.07   P  z 
  0.075
0.0556 

C. Does the result in part B give you reason to doubt the study’s reported value? Explain.
Although the probability we would 0.07 or less by chance is not extremely high, it is not small enough
to give me reason to doubt the study’s reported value.
9. A school official suspects the difference in the proportion of students who own technological devices
between Lakeville North and Lakeville South high schools may be a result of a difference in the
socioeconomic status of the students in the two schools. The results of a random sampling of student
registration records indicated 28 out of 120 students at Lakeville North came from low income families
while 30 out of 150 students at Lakeville South came from low income families. Do these data provide
convincing evidence that the proportion of low income students at Lakeville North is higher than the
proportion at Lakeville South? Use a 5% significance level.
Hypotheses:
H0 : pN  pS the proportion of students from low income families is the same for both high schools
Ha : pN  pS the proportion of low income students at Lakeville North is higher than the proportion at
Lakeville South
Conditions:
Random: 2 independent random samples stated
Independent: there are 1700 students at each high school (see problem 8) which is more than
120(10) = 1200 or 150(10) = 1500 students.
28 
 28 
 30 

Normal: Since 120 
  28  10 , 120  1 
  30  10 and
  92  10 , 150 
 120 
 150 
 120 
30 

150  1 
  120  10 , the samples are large enough to conclude the sampling distribution is
 150 
approximately normal.
Calculations:
2 – proportion z test
28
30

ˆpN  pˆS
120 150
z

 0.66
28 
28  30 
30 
pˆN 1  pˆN  pˆS 1  pˆS 

1 

1 

120  120  150  150 
nN
nS

120
150
p  0.2538
Conclusion:
Since p > α (0.25 > 0.05), I fail to reject H0 . There is insufficient evidence to conclude the proportion of
low income students at Lakeville North is higher than the proportion at Lakeville South.
10. Do boys have better short term memory than girls? A random sample of 200 boys and 150 girls was
administered a short term memory test. The average score for boys was 48.9 with a standard deviation
12.96. The girls had as average score of 48.1 with standard deviation 11.85. Is there significant evidence
at the 5% level to suggest boys have better short term memory than girls? Note: higher test scores
indicate better short term memory.
Hypotheses:
H0 : B  G Boys and girls have the same mean short term memory score.
Ha : B  G Boys have a higher mean short term memory score than girls.
Conditions:
Random: 2 independent random samples – stated
Independent: there are more than 200(10) = 2000 boys and more than 150(10) = girls.
Normal: Since we have a large sample sizes (200 < 30 and 150 > 30), the sampling distribution is
approximately normal.
Calculations:
2 – sample t test
x x
48.9  48.1
t B G 
 0.6003
2
2
12.962 11.852
sB sG


200
150
nB nG
Using the t chart and df = 150 – 1 = 149, p > 0.25
Using the calculator, df = 334.6 and p  0.274
Conclusion:
Since p    0.27  0.05 , I fail to reject H0 . There is insufficient evidence to conclude boys have a
higher mean short term memory score than girls.
11. After playing a dice game with a friend, you suspect the die may not be fair. That is, you suspect some
numbers may be rolled more often than you would expect. To test your suspicion, you roll the die 300
times and record the results.
Value:
1
2
3
4
5
6
Frequency:
42
55
38
57
64
44
Is there convincing evidence that the die is not fair?
Hypotheses:
H0 : the distribution of values is uniform. The dice is fair.
Ha : some values are rolled more than others. The dice is not fair.
Conditions:
Random: rolling a die is a random value
Independent: there are more than 300(10) = 3000 dice rolls
All expected counts are 300/6 = 50 > 5
Calculations:
 2 Goodness of Fit Test
 
2
 observed
 expected
2
 42  50 

expected
50
Using the chart and df = 5, 0.05 < p < 0.10
Using the calculator and df = 5, p = 0.0677
Conclusion:
2
 55  50 

2
50
 ...  10.28
Since p    0.0677  0.05 , I fail to reject H0 . There is insufficient evidence to conclude some values
are rolled more than others. We cannot conclude the dice is unfair.
12. A study by Consumer Reports rated 77 randomly selected cereals on a 100 point scale (higher numbers are
better) and recorded the number of grams of sugar in each serving. The scatterplot, residual plot, and
computer output of the regression analysis are given.
A. Use this output to determine the LSRL for the sample data.
yˆ  59.284  2.4088 x
Where ŷ is the predicted rating and x is the amount of sugar in grams per serving.
B. Interpret the slope in the context of the situation.
If the amount of sugar increases by 1 gram/serving, we predict the rating will decrease on average
2.4088.
C. Is there convincing evidence that the slope of the true regression line is less than zero? Assume the
conditions for inference are met.
Hypotheses:
H0 :   0 there is no linear relationship between ratings and sugar content.
Ha :   0 the slope of the true regression line is less than zero. There is a negative linear relationship
between ratings and sugar content.
Conditions: met – stated
Calculations:
t Test for slope
Using the computer output, t = 1 – 10.12 and p =0
Conclusion:
Since p  any reasonable  , I reject H0 . There is sufficient evidence to conclude the slope of the true
regression line is less than zero. There is a negative linear relationship between ratings and sugar
content