Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Introduction to
Hypothesis Testing
1
1 Introduction
The purpose of hypothesis testing is to determine
whether there is enough statistical evidence in
favor of a certain belief about a parameter.
Examples
Is there statistical evidence in a random sample of potential
customers, that support the hypothesis that more than 10% of
the potential customers will purchase a new products?
Is a new drug effective in curing a certain disease? A sample
of patients is randomly selected. Half of them are given the
drug while the other half are given a placebo. The
improvement in the patients conditions is then measured and
compared.
2
2 Concepts of Hypothesis Testing
The critical concepts of hypothesis testing.
Example:
• An operation manager needs to determine if the
mean demand during lead time is greater than
350.
• If so, changes in the ordering policy are needed.
There are two hypotheses about a population
mean:
• H0: The null hypothesis
m = 350
• H1: The alternative hypothesis m > 350
3
2 Concepts of Hypothesis Testing
• Assume the null hypothesis is true (m= 350).
m = 350
– Sample from the demand population, and build a statistic
related to the parameter hypothesized (the sample mean).
– Pose the question: How probable is it to obtain a
sample mean at least as extreme as the one observed
from the sample, if H0 is correct?
4
2 Concepts of Hypothesis Testing
• Assume the null hypothesis is true (m= 350).
x  355
m = 350
x  450
– Since the x is much larger than 350, the mean m is likely
to be greater than 350. Reject the null hypothesis.
– In this case the mean m is not likely to be greater than
350. Do not reject the null hypothesis.
5
Types of Errors
• Two types of errors may occur when deciding whether to
reject H0 based on the statistic value.
– Type I error: Reject H0 when it is true.
– Type II error: Do not reject H0 when it is false.
• Example continued
– Type I error: Reject H0 (m = 350) in favor of H1 (m > 350)
when the real value of m is 350.
– Type II error: Believe that H0 is correct (m = 350) when
the real value of m is greater than 350.
6
Controlling the probability of
conducting a type I error
Recall:
H0: m = 350 and H1: m > 350.
H0 is rejected if x is sufficiently large
Thus, a type I error is made if x  critical value
when m = 350.
By properly selecting the critical value we can limit
the probability of conducting a type I error to an
acceptable level.
Critical value
m = 350
x
7
3 Testing the Population Mean When the
Population Standard Deviation is Known
• Example 1
– A new billing system for a department store will be costeffective only if the mean monthly account is more than
$170.
– A sample of 400 accounts has a mean of $178.
– If accounts are approximately normally distributed with
s = $65, can we conclude that the new system will be
cost effective?
8
Testing the Population Mean (s is Known)
Example 1 – Solution
The population of interest is the credit accounts at the
store.
We want to know whether the mean account for all
customers is greater than $170.
H1 : m > 170
– The null hypothesis must specify a single value of
the parameter m, H0 : m = 170
9
Approaches to Testing
There are two approaches to test whether the
sample mean supports the alternative hypothesis
(H1)
The rejection region method is mandatory for manual
testing (but can be used when testing is supported by a
statistical software)
The p-value method which is mostly used when a
statistical software is available.
10
The Rejection Region Method
The rejection region is a range of values such
that if the test statistic falls into that range, the
null hypothesis is rejected in favor of the
alternative hypothesis.
11
The Rejection Region Method –
for a Right - Tail Test
Example 1 – solution continued
• Recall:
therefore,
H0: m = 170
H1: m > 170
• It seems reasonable to reject the null hypothesis and
believe that m > 170 if the sample mean is sufficiently large.
Reject H0 here
Critical value of the
sample mean
12
The Rejection Region Method
for a Right - Tail Test
Example 1 – solution continued
• Define a critical value x L for
to reject the null hypothesis.
x that is just large enough
• Reject the null hypothesis if
x  xL
13
Determining the Critical Value for
the Rejection Region
Allow the probability of committing a Type I error
be a (also called the significance level).
Find the value of the sample mean that is just large
enough so that the actual probability of committing
a Type I error does not exceed a. Watch…
14
Determining the Critical Value –
for a Right – Tail Test
Example 1 – solution continued
za 
a
m x  170
x L  170
65
400
xL
x
P(commit a Type I error) = P(reject H0 given that H0 is true)
= P( x  x L given that H0 is true) … is allowed to be a.
Since P(Z  Z a )  a we have:
15
Determining the Critical Value –
for a Right – Tail Test
Example 1 – solution continued
a = 0.05
m x  170
xL
za 
x L  170
65
400
65
x L  170  z a
.
400
If we select a  0.05, z .05  1.645 .
65
x L  170  1.645
 175 .34 .
400
16
Determining the Critical value
for a Right - Tail Test
Re ject the null hypothesis if
x  175 .34
Conclusion
Since the sample mean (178) is greater than
the critical value of 175.34, there is sufficient
evidence to infer that the mean monthly
balance is greater than $170 at the 5%
significance level.
17
The standardized test statistic
Instead of using the statistic x, we can use the
standardized value z.
z
x m
s
n
Then, the rejection region becomes
z  za
One tail test
18
The standardized test statistic
Example 1 - continued
We redo this example using the standardized test statistic.
Recall: H0: m = 170
H1: m > 170
Test statistic:
x  m 178  170
z
s
n

65
400
 2.46
Rejection region: z > z.05  1.645.
19
The standardized test statistic
Example 1 - continued
Re ject the null hypothesis if
Z  1.645
Conclusion
Since Z = 2.46 > 1.645, reject the null
hypothesis in favor of the alternative
hypothesis.
20
P-value Method
The p-value provides information about the
amount of statistical evidence that supports the
alternative hypothesis.
– The p-value of a test is the probability of observing a
test statistic at least as extreme as the one computed,
given that the null hypothesis is true.
– Let us demonstrate the concept on Example 1
21
P-value Method
The probability of observing a
test statistic at least as extreme as 178,
given that m = 170 is…
P( x  178 when m  170 )
178  170
 P( z 
)
65 400
 P( z  2.4615 )  .0069
m x  170
x  178
The p-value
22
Interpreting the p-value
Because the probability that the sample mean
will assume a value of more than 178 when m =
170 is so small (.0069), there are reasons to
believe that m > 170.

Note how the event
x  178 is rare under H0
when m x  170, but...
…it becomes more
probable under H1,
when m x  170
H0 : m x  170
H1 : m x  170
x  178
23
Interpreting the p-value
We can conclude that the smaller the p-value
the more statistical evidence exists to support the
alternative hypothesis.
H0 : m x  170
H1 : m x  170
x  178
24
Interpreting the p-value
Describing the p-value
If the p-value is less than 1%, there is overwhelming
evidence that supports the alternative hypothesis.
– If the p-value is between 1% and 5%, there is a strong
evidence that supports the alternative hypothesis.
– If the p-value is between 5% and 10% there is a weak
evidence that supports the alternative hypothesis.
– If the p-value exceeds 10%, there is no evidence that
supports the alternative hypothesis.
25
The p-value and the Rejection Region
Methods
The p-value can be used when making decisions
based on rejection region methods as follows:
• Define the hypotheses to test, and the required
significance level a.
• Perform the sampling procedure, calculate the test
statistic and the p-value associated with it.
• Compare the p-value to a. Reject the null
hypothesis only if p-value <a; otherwise, do not
reject the null hypothesis.
a = 0.05
The p-value
m x  170
x L  175.34
x  178
26
Conclusions of a Test of Hypothesis
If we reject the null hypothesis, we conclude
that there is enough evidence to infer that the
alternative hypothesis is true.
If we do not reject the null hypothesis, we
conclude that there is not enough statistical
evidence to infer that the alternative hypothesis
is true.
The alternative hypothesis
is the more important
one. It represents what
we are investigating.
27
A Two - Tail Test
Example 2
AT&T has been challenged by competitors who
argued that their rates resulted in lower bills.
A statistics practitioner determines that the
mean and standard deviation of monthly longdistance bills for all AT&T residential customers
are $17.09 and $3.87 respectively.
28
A Two - Tail Test
Example 2 - continued
A random sample of 100 customers is selected
and customers’ bills recalculated using a leading
competitor’s rates (see Xm11-02).
Assuming the standard deviation is the same
(3.87), can we infer that there is a difference
between AT&T’s bills and the competitor’s bills
(on the average)?
29
A Two - Tail Test
Solution
Is the mean different from 17.09?
H0: m = 17.09
H1 : m  17.09
– Define the rejection region
z   za / 2 or z  za / 2
30
A Two – Tail Test
Solution - continued
a/2  0.025
x
a/2  0.025
17.09
If H0 is true (m =17.09), x can still fall far
above or far below 17.09, in which case
we erroneously reject H0 in favor of H1
(m  17.09)
x
We want this erroneous
rejection of H0 to be a
rare event, say 5%
chance.
31
A Two – Tail Test
Solution - continued
z
a/2  0.025
xm
s

n
17.55  17.09
3.87
 1.19
100
17.55
x
17.09
x
From the sample we have:
a/2  0.025
a/2  0.025
a/2  0.025
x  17.55
-za/2 = -1.96
0 za/2 = 1.96
Rejection region
32
A Two – Tail Test
There is insufficient evidence to infer that there is a
difference between the bills of AT&T and the competitor.
Also, by the p value approach:
The p-value = P(Z< -1.19)+P(Z >1.19)
= 2(.1173) = .2346 > .05
a/2  0.025
z
xm
s
n
a/2  0.025
-1.19 0 1.19

17.55  17.09
3.87
 1.19
-za/2 = -1.96
za/2 = 1.96
100
33
11.4 Calculating the Probability of a
Type II Error
To properly interpret the results of a test of
hypothesis, we need to
specify an appropriate significance level or judge the pvalue of a test;
understand the relationship between Type I and Type II
errors.
How do we compute a type II error?
34
Calculation of the Probability
of a Type II Error
To calculate Type II error we need to…
express the rejection region directly, in terms of the
parameter hypothesized (not standardized).
specify the alternative value under H1.
Let us revisit Example 1
35
Calculation of the Probability
of a Type II Error
Express the rejection
region directly, not in
standardized terms
• Let us revisit Example 1
– The rejection region was x  175.34 with a = .05.
– Let the alternative value be m = 180 (rather than just
m>170)
H : m = 170
0
H1: m = 180
Do not reject H0
a=.05
m= 170
xL 
Specify the
alternative value
under H1.
m180
175 .34
36
Calculation of the Probability
of a Type II Error
– A Type II error occurs when a false H0 is not
rejected.
H0: m = 170
A false H0…
…is not rejected
H1: m = 180
x  175 .34
m= 170
xL 
a=.05
m180
175 .34
37
Calculation of the Probability
of a Type II Error
  P( x  175.34 given that H0 is false)
 P( x  175.34 given that m  180)
175.34  180
 P( z 
)  .0764
65 400
H0: m = 170
H1: m = 180
m= 170
xL 
m180
175 .34
38
Effects on  of changing a
Decreasing the significance level a, increases the
value of , and vice versa.
2 < 1
m= 170
a2 > a1
m180
39
Judging the Test
A hypothesis test is effectively defined by the
significance level a and by the sample size n.
If the probability of a Type II error  is judged to be
too large, we can reduce it by
increasing a, and/or
increasing the sample size.
40
Judging the Test
Increasing the sample size reduces 
xL  m
Re call : z a 
, thus
s n
s
xL  m  z a
n
By increasing the sample size the
standard deviation of the sampling
distribution of the mean decreases.
Thus, x Ldecreases.
41
Judging the Test
Increasing the sample size reduces 
xL  m
Re call : z a 
, thus
s n
s
xL  m  z a
n
Note what happens when n increases:
a does not change,
but  becomes smaller
m= 170
xxxLLxLxLxLL
m180
42
Judging the Test
Increasing the sample size reduces 
In Example 11.1, suppose n increases from 400 to
1000.
s
65
xL  m  z a
 170  1.645
 173 .38
n
1000
173 .38  180
  P( Z 
)  P( Z  3.22 )  0
65 1000
• a remains 5%, but the probability of a Type II
drops dramatically.
43
Judging the Test
Power of a test
The power of a test is defined as 1 - .
It represents the probability of rejecting the null
hypothesis when it is false.
44