Download Hypothesis Testing - one sample.

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Intermediate
Statistical Analysis
Professor K. Leppel
Hypothesis Testing
One Sample
Notation
H0:
hypothesis being tested
“null hypothesis”
Ha or H1:
alternative hypothesis
Example: H0:  = 100 versus H1:   100
Type I and Type II errors
Type I error – rejecting the null hypothesis
when it’s true.
Type II error – accepting the null hypothesis
when it’s false.
Situation
H0 is true
accept H0
Decision
reject H0
H0 is false
Situation
H0 is true
accept H0
Decision
reject H0
incorrect decision
(type I error)
probability = 
= level of significance
H0 is false
Situation
H0 is true
accept H0
correct decision
probability = 1- 
= confidence level
reject H0
incorrect decision
(type I error)
probability = 
= level of significance
Decision
H0 is false
Situation
H0 is true
H0 is false
accept H0
correct decision
probability = 1- 
= confidence level
incorrect decision
(type II error)
probability = 
reject H0
incorrect decision
(type I error)
probability = 
= level of significance
Decision
Situation
H0 is true
H0 is false
accept H0
correct decision
probability = 1- 
= confidence level
incorrect decision
(type II error)
probability = 
reject H0
incorrect decision
(type I error)
probability = 
= level of significance
correct decision
probability = 1- 
= power of the test
Decision
Critical region
Values of the test statistic for which the null hypothesis
is rejected.
Example: Suppose you are testing whether or not the
population mean is 100.
Your test statistic is the sample mean.
If the sample mean is very far from 100, perhaps less
than 90 or more than 110, you reject the hypothesis
that the mean is 100.
Then your critical region is the set of values of the
sample mean that are less than 90 or more than 110.
Acceptance region
Values of the test statistic for which the null hypothesis
is accepted.
Example: Again suppose you are testing whether the
population mean is 100, and your test statistic is the
sample mean.
You decided to reject the null hypothesis if the sample
mean is less than 90 or more than 110.
You accept the null hypothesis if the sample mean is
between 90 and 110.
Then your acceptance region is the set of values of the
sample mean that are between 90 and 110.
Example: Suppose you want to know whether
the mean IQ at a particular university is 130.
You know the population standard deviation 
is 5.4.
So, H0 :   130 and H1 :   130
To test the hypothesis, you take a random sample of 25
observations.
The mean IQ of the sample is 128.
If the null hypothesis is correct, the center of the
distribution of the sample mean is 130.
Our critical and acceptance regions look like this graph.
We need to determine what a and b are so that we know
when we should accept our null hypothesis and when we
should reject it.
acceptance
region
crit. reg.
crit. reg.
a
130
b
X
Suppose we want our probability  of type I error to be at most 0.05.
We want the probability of rejecting the null hypothesis when it is
actually correct to be 0.05.
So we are in the critical region when the graph is centered at 130
with probability 0.05.
The area in each of the two tails of the critical region is half of
0.05 or 0.025.
acceptance
region
0.025
crit. reg.
crit. reg.
a
0.025
130
b
X
Our sample mean is X.
According to the central limit theo rem, it is normal
with mean  and standard deviation 
So, Z 
X

n
is standard normal.
n
.
Pr(X  b |   130)  .025  Pr(X  a |   130)




X- b-

X- a

Pr

|   130   .025  Pr

|   130 






n
n
n
n












b-
a
Pr Z 
|   130   .025  Pr Z 
|   130 






n
n










b - 130 
a  130 
Pr Z 
 .025  Pr Z 


5
.
4
5
.
4




25
25










b - 130 
a  130 
Pr Z 
  .025  Pr Z  5.4

5
.
4




25
25




The equation we derived implies this graph of the standard normal
or Z distribution.
From the Z table, we know that the cut-off points for the tails with
area 0.025 must be 1.96 and -1.96.
.025
-1.96
a  130
5.4
25
1.96
0
.025
b  130
5.4
25
Z
a  130
So
 - 1.96 and
5.4
25
b  130
 1.96
5.4
25
Solving the first equation for a and the second equation
for b, we find a = 127.88 and b = 132.12.
Returning to our graph for the sample mean, we have this picture.
So if our sample mean is between 127.88 and 132.12, we accept
the null hypothesis that the population mean IQ is 130.
If our sample mean is less than 127.88 or more than 132.12, we
reject the null hypothesis and accept the alternative that the
population mean is not 130.
acceptance
region
crit. reg.
crit. reg.
127.88
130
132.12
X
Since the sample mean was 128, what would our decision be?
Accept H0:  = 130
If the sample mean was 127 instead, what would our decision be?
Reject H0 :   130 and accept H1 :   130
acceptance
region
crit. reg.
crit. reg.
127.88
130
132.12
X
In the hypothesis test we just performed, we determined the cut-off
points a and b for the sample mean, and then checked to see if our
observed values were in the acceptance or critical region.
Another way of doing the hypothesis test is to use the “p-value.”
The p-value is the probability that the value of the sample mean is
as far away from the hypothesized mean as observed, if the
hypothesized mean is actually correct.
If the p-value is greater than our test level , we accept the null
hypothesis.
If the p-value is less than , we reject the null hypothesis.
The reasoning when the p-value is less than  is this:
The probability of seeing the observed value of the sample mean, if
the null hypothesis is true, is very small. Since it’s unlikely under
the assumption of the null hypothesis, we reject that hypothesis.
Let’s do the same problem using the p-value method. Recall that the
sample mean was 128, the population standard deviation was 5.4, the
sample size was 25, and the test level was 0.05.
Pr (X is as far from 130 as we saw given that the population mean is 130)
 Pr (X  128 or X  132 given that the population mean is 130)
 2Pr (X  128 given that the population mean is 130)


 X -  128 - 

 2Pr 

|   130 




n
n



 X -  128 - 130
 2Pr 


5.4

n
25






 2Pr Z  1.85 
= 2 (0.5-0.4678)
= 2 (0.0322)
= 0.0644
.4678
.4678
.0322
-1.85
.0322
0
1.85
Z
Using a test level of  = 0.05, we accept the
null hypothesis, since the p-value > 0.05 .
Quick & Dirty Method of Hypothesis Testing
1. Convert the sample mean to a standard normal Z-statistic:
Z
X-

n
2. Sketch the critical & acceptance regions in terms of a Z-statistic.
3. Make the decision.
Same problem one more time: Recall that the sample mean
was 128, the population standard deviation was 5.4, the
sample size was 25, and the test level was 0.05.
X  128
Z
X
Based on a test level or  of 0.05,
we have the graph below.

n
128  130

5.4
25
 1.85
crit. reg.
crit. reg.
.4750
.4750
.0250
-1.96
.0250
0
1.96
Z
Since the Z-value -1.85 is in the acceptance
region, we accept the null hypothesis that
the population mean is 130.
When a hypothesis has only one value such as
 = 130, we call it a simple hypothesis.
When a hypothesis has more than one value,
we call it a composite hypothesis.
Some examples:
  130
  130
  130
When we had the hypotheses H 0 :   130 and H1 :   130,
we had a 2 - tailed critical region.
If the sample mean was much larger or much smaller th an 130,
we rejected the null hypothesis in favor of the alternativ e.
However, if we have the hypotheses H 0 :   130 and H1 :   130,
we have a 1 - tailed critical region.
Now we reject the null hypothesis in favor of the alternativ e,
only if the sample mean is much smaller th an 130.
If we have the hypotheses H 0 :   130 and H1 :   130,
we also have a 1 - tailed critical region.
We reject the null hypothesis in favor of the alternativ e,
only if the sample mean is much larger tha n 130.
The critical region depends on the alternative hypothesis, not the null.
If the alternative is “not equal to”, the critical region is 2-tailed.
If the alternative is “greater than”, the critical region is the right tail only.
If the alternative is “less than”, the critical region is the left tail only.
1-Tailed Hypothesis Test
Test H0 :   130 versus H1 :   130
at the 5% level if the sample mean is 128 based
on a sample of size 25, and the population
standard deviation is 5.4.
We will again use these 3 methods to do this problem:
1. Determine critical region in terms of the sample
mean.
2. P-value
3. “Quick & Dirty”
1. Determine critical region in terms of the sample mean.
We sketch the graph under the assumption that the null hypothesis
is true, centering the graph at the most conservative value of that
hypothesis,  = 130.
H0 :   130
H1 :   130
.45
crit. reg.
Acceptance region
.05
a
130
X
Since the alternative hypothesis is <130, the critical region is
the left tail.
Since the test level is 5%, the area of the critical region is 0.05.
Pr(X  a |   130)  0.05
crit. reg.


X- a

Pr

|   130   0.05



n
n






a
Pr Z 
|   130   0.05



n





a  130 
Pr Z 
 0.05

5.4


25


.45
Acceptance region
.05
130
a
crit. reg.
X
.45
.05
Acceptance region
a  130
5.4
25
0
Z
Using the Z table, we find the
cut-off point must be -1.645.
So,
a  130
 1.645
5.4
25
.45
crit. reg.
.05
Acceptance region
a
130
128.22
Solving
X
a  130
 1.645 for a , we find a = 128.22.
5.4
25
Our value of the sample mean was 128.
Since 128 is in the critical region, we reject the null hypothesis
and accept the alternative that < 130.
2. P-value method. Keep in mind that the sample mean is 128, the
population standard deviation is 5.4, the sample size is 25, and the test
level is 0.05.
Pr (X is as far below 130 as we saw given that the population mean is 130)
 Pr (X  128 given that the population mean is 130)


 X -  128 - 

 Pr 

|   130 




n
n



 X -  128 - 130
 Pr 


5.4

n
25






 Pr Z  1.85 
= 0.5 – 0.4678
= 0.0322
.0322
-1.85
0
Z
Using a test level of  = 0.05, we reject the
null hypothesis and accept the alternative
that <130, since the p-value < 0.05 .
3. “Quick & Dirty” Method. Again, remember that the
sample mean was 128, the population standard deviation
was 5.4, the sample size was 25, and the test level was 0.05.
X  128
Z
X
Based on a test level or  of 0.05,
we have the graph below.

n
128  130

5.4
25
 1.85
crit. reg.
.45
.05
Acceptance region
-1.645
0
Z
Since the Z-value -1.85 is in the critical
region, we reject the null hypothesis and
accept the alternative that < 130.
Calculating 
Recall that  is the probability of type I error
= Pr( reject H0 | H0 is true), and
 is the probability of type II error
= Pr( accept H0 | H0 is false).
Suppose , for example, we are testing at the 5% level
H0 :   1000 versus H1 :   1000
where the population standard deviation is 50
and the sample size is 25.
Let’s first determine the critical region, and then
calculate , based on a specified value of .
Pr(X  b |   1000)  .025  Pr(X  a |   1000)




X- b-

X- a

Pr

|   1000   .025  Pr

|   1000 








n
n
n
n












b-
a
Pr Z 
|   1000   .025  Pr Z 
|   1000 






n
n






b - 1000
Pr Z 
50

25





a  1000

.025

Pr
Z



50


25







acceptance
region
0.025
0.025
crit. reg.
crit. reg.
a
1000
b
X


b - 1000
Pr Z 
50

25





a  1000
  .025  Pr Z  50


25







The above equation implies this graph of the standard normal
or Z distribution.
From the Z table, we
know that the cut-off
points for the tails with
area 0.025 must be
1.96 and -1.96
-1.96
.025
a  1000
50
25
1.96
0
.025
b  1000
50
25
Z
a  1000
So
 - 1.96 and
50
25
b  1000
 1.96
50
25
Solving the first equation for a and the second equation
for b, we find a = 980.4 and b = 1019.6.
So our critical and acceptance
regions look like this:
acceptance
region
crit. reg.
crit. reg.
980.4
1000
1019.6
X
But what if  is actually 990, not 1000?
Our cut-off values for the critical and acceptance regions
are still the same, but the distribution curve is centered at
990 instead of at 1000.
 = Pr(accepting H0| H0 is false, in this case, =990)
= Pr(being in acceptance region| =990)
 Pr(980.4  X  1019.6 |   990)
acceptance
region
crit. reg.
980.4
crit. reg.
990
1019.6
X
Pr(980.4  X  1019.6 |   990)


 980.4  990 X   1019.6  990 
 Pr



50

50


25
n
25 

 Pr 0.96  Z  2.96
We determine from the Z table
that the specified area is
0.3315 + 0.4985 = 0.83
.3315
-0.96
.4985
0
2.96
Z
So, , the probability of accepting
the null hypothesis that  = 1000,
when  is actually 990, is 0.83.
With the given standard deviation and sample
size, it is difficult to distinguish a distribution
with a mean of 990 from a distribution with a
mean of 1000.
If we had a mean of 900, which is farther from
1000 than 990 is, it would be easier to distinguish
it from 1000, and our value of  would be
smaller.
In general, we want the probability of type I
and type II errors,  and , to be small.
How can we make  and  smaller?
Since  is the probability of accepting the null
hypothesis when it is false, we can make  smaller
by accepting the null hypothesis less often (shrinking
the acceptance region).
If we shrink the acceptance region, however, we
expand the critical region.
That means we reject the null hypothesis more often,
increasing the probability of rejecting it when it is
correct.
So we’ve increased .
Similarly, if we try to decrease , we increase .
Example: Parties
Suppose you have just completed your first semester at college.
You are reflecting back on your social experiences of the semester, and
thinking that you went to a lot of boring parties.
You are thinking that you would like to reduce the number of boring parties
that you attend.
Think of the decision to attend a party as one involving the null hypothesis
that the party is a good one and the alternative hypothesis is that the party is
boring.
You want to keep down the error probabilities
 = Pr(skip a party | it’s a good one) and  = Pr(attend a party | it’s boring).
To reduce , you might go to a lot of parties, but then you risk attending
boring ones.
To reduce , you might go to very few parties, but then you risk missing
some good ones.
Is there some way that we can
decrease both  and ?
Yes, we can decrease both  and  by collecting
more data or more information, by increasing the
sample size.
Unfortunately, in the real world, collecting more
data usually means increasing the cost of the
study.
So there’s a tradeoff between accuracy and cost.
In the context of our party example,
to keep down the number of boring parties you
attend as well as the number of good parties you
miss, you need to get more information.
You need to talk to more people, find out who’s
going, the music that is likely to be played, the
food that will be served, etc.
How do we do hypothesis testing when the
population standard deviation is unknown?
We do it similarly to what we’ve done so far, but we
replace the population standard deviation  by the
sample standard deviation
n
s
2
(X

X
)
 i
i 1
n -1
and the Z distribution by the t distribution
X-
t n 1 
s
n
Example: Suppose you sample 16 matchboxes.
The sample mean number of matches per box is
17 and the sample standard deviation is 2. Test
at the 5% level H 0 :   18 versus H1 :   18.
Use all three methods that we used previously.
Method 1
If the null hypothesis is correct, the distribution of the sample
mean is centered at 18.
We need to find the values of a and b so that the combined area in
the two tails is 0.05.
Pr(X  a |   18)  .025  Pr(X  b |   18)
We don’t have to compute a and b
separately. We can just compute
one of them and figure out the
other by symmetry. We’ll do b.
acceptance
region
0.025
0.025
crit. reg.
crit. reg.
a
18
b
X
Pr(X  b |   18)  .025


 X- b-

Pr

|   18   .025
s
 s

n
n






b-
Pr t n 1 
|   18   .025
s


n




Pr t15 




Similarly, Pr t15 


b - 18
2
16
a - 18
2
16


  .025




  .025 .




b - 18
Pr t15 
2

16





a - 18
  .025  Pr t15  2


16







The equation we derived implies this graph of the t distribution.
From the t table, we find that
for 15 degrees of freedom, the
cut-off points for the two tails
with a combined area 0.05 are
2.131 and -2.131.
.025
-2.131
a  18
2
16
2.131
0
.025
b  18
2
16
t15
a  18
So
 - 2.131 and
2
16
b  18
 2.131
2
16
Solving the first equation for a and the second equation
for b, we find a = 16.93 and b = 19.07, which gives us
the acceptance and critical regions shown in the graph.
Since our value for the sample
mean is 17, which is in the
acceptance region, we
accept the null hypothesis
acceptance
that  = 18.
region
crit. reg.
crit. reg.
16.93
18
19.07
X
Method 2
Recall that the sample size is 16, the sample mean
number of matches is 17, the sample standard
deviation is 2, and we’re testing at the 5% level
H 0 :   18 versus H1 :   18.
Since we’re doing a 2-tailed test, we need a 2-tailed p-value.
Pr (X is as far from 18 as we saw given that the population mean is 18)
 Pr (X  17 or X  19 given that the population mean is 18)
 2Pr (X  17 given that the population mean is 18)


 X -  17 - 

 2Pr 

|   18 
s
 s

n
n




 X -  17 - 18 
 2Pr 


s
2


16 
n

 2Pr t15   2.00 
Using Excel, we find
that this 2-tailed p-value
is 0.064.
.032
-2.00
.032
0
2.00
t15
So based on a test level of  = 0.05,
we accept the null hypothesis, since
the p-value > 0.05 .
Method 3
Keep in mind that the sample mean was 17, the sample standard
deviation was 2, the sample size was 16, and the test level was 0.05.
X  17
X 
t15 
s
n
17  18

2
16
 2.00
Based on a test level or  of 0.05,
we have the graph below.
crit. reg.
.0250
Acceptance
region
-2.131
0
crit. reg.
.0250
2.131
t15
Since our t15-value, -2.00, is in the acceptance
region, we accept the null hypothesis that the
population mean is 18.
New Example: Let X be the number of miles driven
when the cumulative repair cost for a car reaches
$500. Given a sample of 15 observations, a sample
mean of 22,500, and a sample standard deviation of
4,000, test at the 5% level
H 0 :   24,000 versus H1 :   24,000.
Use all three methods that we used previously.
Notice that for this test, the critical region is just the
left tail.
Also, since we have the sample standard deviation,
not the population standard deviation, we’ll be using a
t-statistic.
Method 1
If the null hypothesis is correct, the distribution of the
sample mean is centered at 24,000.
We need to find the value of a so that the area in the left
tail is 0.05.
Pr(X  a |   24,000)  .05
0.05
crit. reg.
a
acceptance region
24,000
X
Pr(X  a |   24,000)  .05


 X- a-

Pr

|   24,000   .05
s
 s

n
n






a-
Pr t n 1 
|   24,000   .05
s


n




a - 24,000
Pr t14 
4,000

15



  .05




a - 24,000
Pr t14 
4,000

15



  .05


The equation we derived implies this graph of the t distribution.
From the t table, we find that
for 14 degrees of freedom, the
cut-off point for a left tail
with an area of 0.05 is -1.761.
.05
-1.761
a  24,000
4,000
15
0
t14
a  24,000
So
 - 1.761
4,000
15
Solving the equation for a, we find a = 22,181. This
gives us the acceptance and critical regions shown in
the graph.
Since our value for the sample
mean is 22,500, which is in
the acceptance region, we
accept the null hypothesis
that  = 24,000.
crit. reg.
acceptance region
22,181
24,000
X
Method 2
Recall that the sample size is 15, the sample mean is
22,500, the sample standard deviation is 4,000, and
we’re testing at the 5% level
H 0 :   24,000 versus H1 :   24,000.
Since we’re doing a 1-tailed test, we need a 1-tailed p-value.
Pr (X is as far below 24,000 as we saw given that   24,000)
 Pr (X  22,500 given that   24,000)


 X -  22,500 - 

 Pr 

|   24,000 
s
s


n
n



 X -  22,500 - 24,000
 Pr 

s
4,000

n
15

 Pr t14  1.452 
Using Excel, we find
that this p-value is 0.084.





.084
-1.452
0
t14
So based on a test level of  = 0.05,
we accept the null hypothesis, since
the p-value > 0.05 .
Method 3
Keep in mind that the sample mean was 22,500,
the sample standard deviation was 4,000, the
sample size was 15, and the test level was 0.05.
X  22,500
X 
t14 
s
n
22,500  24,000

4,000
15
 1.452
Based on a test level or  of 0.05,
we have the graph below.
crit. reg.
.05
Acceptance
region
-1.761
0
t14
Since our t14-value, -1.452, is in the
acceptance region, we accept the null
hypothesis that the population mean is 24,000.
Hypothesis testing
on the population proportion p
If X is the number of successes on n independent trials,
then X/n  p is the sample proportion of successes.
If X and n-X are each at least five, then X/n  p is approximately
normally distributed, with mean equal to the population
proportion p and standard deviation equal to
So
p-p
p (1-p )
n
p (1-p )
n
is distributed as a standard normal or Z.
So for hypothesis testing on a population
proportion, we have the statistic
Z 
p-p
p (1-p )
n
.
Example
A sample of 100 people in a state shows 33% are Democrats.
Test at the 5% level H0 : p  0.30 versus H1: p  0.30 .
Z
p-p
p (1-p )
n
0.33 - .30

(.30)(.70)
100
 0.665
Notice that our test is right-tailed, so
the critical region is as shown below.
From the Z table, we know that the
cut-off point is 1.645.
.45
crit. reg.
acceptance region
.05
0
1.645
Z
Since our Z-statistic is in the acceptance
region, we accept H0: p = 0.30
In the section we have just completed, we did 3 different
types of hypothesis tests for the case of a single sample.
1.
2.
3.
population mean - known population variance
population mean - unknown population variance
population proportion
In the next section we will look at hypothesis testing when we
have more than one sample.