Download Note

Document related concepts

Sufficient statistic wikipedia , lookup

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistics for Business
(ENV)
Chapter 9
INTRODUCTION TO
HYPOTHESIS TESTING
1
Hypothesis Testing
9.1
9.2
9.3
Null and Alternative Hypotheses and Errors in
Testing
z Tests about a Population with known s
t Tests about a Population with unknown s
2
Hypothesis testing-1
Researchers usually collect data from a sample
and then use the sample data to help answer
questions about the population.
Hypothesis testing is an inferential statistical
process that uses limited information from the
sample data as to reach a general conclusion
about the population.
3
Hypothesis testing-2
• A hypothesis test is a formalized procedure
that follows a standard series of operations.
• In this way, researchers have a standardized
method for evaluating the results of their
research studies.
4
The basic experimental situation for using hypothesis testing is
presented here. It is assumed that the parameter  is known for the
population before treatment. The purpose of the experiment
is to determine whether or not the treatment has an effect. Is the
population mean after treatment the same as or different from the
mean before treatment? A sample is selected from the treated
population to help answer this question.
5
Procedures of hypothesis-testing
1. First, we state a hypothesis about a population. Usually the
hypothesis concerns the value of a population parameter. For example,
we might hypothesize that the mean IQ for UIC students is  = 110.
2. Next, we obtain a random sample from the population. For example,
we might select a random sample of n = 100 UIC students.
3. Finally, we compare the sample data with the hypothesis. If the
data are consistent with the hypothesis, we will conclude that the
hypothesis is reasonable. But if there is a big discrepancy between
the data and the hypothesis, we will decide that the hypothesis is
wrong.
6
Null and Alternative Hypotheses
• The null hypothesis, denoted H0, is a statement of the
basic proposition being tested. It generally represents the status
quo (a statement of “no effect” or “no difference”, or a statement of equality)
and is not rejected unless there is convincing sample evidence that it is false.
• The (scientific or) alternative hypothesis, denoted Ha (or
H1) , is an alternative (to the null hypothesis) statement
that will be accepted only if there is convincing sample evidence
that it is true.
• These two hypotheses are mutually exclusive and
exhaustive.
7
Determined by the level of
significance or the alpha level
8
Z
Alpha level of .05 -- the probability of
rejecting the null hypothesis when it is true
is no more than 5%.
9
The locations of the critical region boundaries
for three different levels of significance
10
Example:
Alcohol appears to be involved in a variety of birth defects,
including low birth weight and retarded growth.
A researcher would like to investigate the effect of prenatal
alcohol on birth weight. A random sample of n = 16 pregnant rats
is obtained. The mother rats are given daily doses of alcohol. At
birth, one pup is selected from each litter to produce a sample of
n = 16 newborn rats. The average weight for the sample is 15
grams. The researcher would like to compare the sample with the
general population of rats. It is known that regular newborn rats
(not exposed to alcohol) have an average weight of m = 18 grams.
The distribution of weights is normal with sd = 4.
11
H0 : µ=18
12
1. State the hypotheses
The null hypothesis states that exposure to alcohol has no effect
on birth weight.
The alternative hypothesis states that alcohol exposure does
affect birth weight.
2. Select the Level of Significance (alpha) level
We will use an alpha level of .05. That is, we are taking a 5% risk
of committing a Type I error, or, the probability of rejecting the
null hypothesis when it is true is no more than 5%.
3. Set the decision criteria by locating the critical region
13
Alpha level of .05 -- the
probability of rejecting the
null hypothesis when it is
true is no more than 5%.
Z
14
4. COLLECT DATA and COMPUTE SAMPLE STATISTICS
The sample mean is then converted to a z-score, which is
our test statistic.
X  0 15  18
z

 3
s / n 4 / 16
5. Arrive at a decision
Reject the null hypothesis
15
Hypothesis Testing
Step 1: State null and alternate hypotheses
Step 2: Select a level of significance
Step 3: Identify the test statistic
Step 4: Formulate a decision rule
Step 5: Take a sample, arrive at a decision
Do not reject null
Reject null and accept alternate
Step 1: State the null and alternate
hypotheses
Null Hypothesis H0:
A statement about the value
of a population parameter (
and s).
With “=” sign
Say, “ = 2” or “  2”
Alternative Hypothesis H1:
A statement that is accepted
if H0 is false
Without “=” sign
Say, “  2” or “ < 2”
17
Step 1: State the null and alternate
hypotheses
H 0:  =  0
H 1:  =  0
Three
possibilities H0:  < 0
regarding H1:  > 0
means
H 0:  >  0
H 1:  <  0
a constant
The null
hypothesis
always
contains
equality.
3 hypotheses about means
18
Step Two: Select a Level of
Significance, 
Level of Significance, 
Measures the max probability of
rejecting a true null hypothesis
 too high
Type I Error
H0 is actually true but you
reject it (false positive).
Type II Error
H0 is false but you accept it
(false negative).
Level of Significance: the maximum allowable probability of making a
type I error
19
Step Two: Select a Level of
Significance, 
Risk table
Null
Hypothesis
Ho is true
Ho is false
Researcher
Accepts
Rejects
Ho
Ho
Correct
Type I
error
decision
(< )
Type II
Correct
Error
Decision
20
Step 3: Select the test statistic
A test statistic is used to determine whether the result of
the research study (the difference between the sample
mean and the population mean) is more than would be
expected by chance alone.
We will only consider statistics Z or t, for the time being.
Since our hypothesis is about the population
mean.
X  0
z
~ N (0,1)
s/ n
21
Test Statistic
• The term test statistic simply indicates that the
sample mean is converted into a single, specific
statistic that is used to test the hypotheses.
• The z-score statistic that is used in the
hypothesis test is the first specific example of
what is called a test statistic.
• We will introduce several other test statistics
that are used in a variety of different research
situations later.
22
Step 4: Formulate the decision rule.
Decision Rule
Reject the H0 if
Determined by level of
significance
H0:   0
H0:   0
H0:  = 0
Computed z > Critical z
Computed z < - Critical z
Computed z > Critical z
Or
Computed z < - Critical z
23
Critical value: The dividing point between the region
where H0 is rejected and the region where H0 is accepted,
determined by level of significance.
From the table, with statistic
z, one tailed test and
significance level 0.05, we
found the critical value 1.65.
Region of
Do not
rejection
reject
[Probability =.95]
[Probability=.05]
H0:   0
Reject if z > Critical z
0
1.65
Critical value
24
One-Tailed Test of Significance
If H0:   0
is true, it is
very unlikely
that the
computed z
value is so
large.
Region of
Do not
rejection
reject
[Probability =.95]
0
[Probability=.05]
1.65
Critical value
.
25
Reject the H0 if
H0:   0
Computed z < - Critical z
If H0:   0 is
true, it is very
unlikely that the
computed z
value (from the
sample mean) is so
small.
26
Two-Tailed Tests of Significance
If H0:  = 0
is true, it is
very unlikely
that the
computed z
value is
extremely
large or
small.
Region of
Region of
Do not
rejection
rejection
reject
[Probability=.025]
[Probability =.95]
-1.96
Critical value
0
[Probability=.025]
1.96
Critical value
27
Step 5: Make a decision.
Accept !
Reject !
28
Example
One Tailed (Upper Tailed)
• An insurance company is reviewing its current policy rates.
When originally setting the rates they believed that the
average claim amount was $1,800. They are concerned that
the true mean is actually higher than this, because they could
potentially lose a lot of money. They randomly select 40
claims, and calculate a sample mean of $1,950. Assuming that
the population standard deviation of claims is $500, and set
level of significance  = 0.05, test to see if the insurance
company should be concerned.
Step 1: Set the null and alternative
hypotheses
29
Example
One Tailed (Upper Tailed)
Step 2: Calculate the test statistic
Step 3: Set Rejection Region
Looking at the picture below, we need to put all of alpha in the right tail. Thus,
R : Z > 1.96
30
Example
One Tailed (Upper Tailed)
Step 4: Conclude
We can see that z=1.897 < 1.96, thus our test
statistic is not in the rejection region.
Therefore we fail to reject the null hypothesis.
We cannot conclude anything statistically significant from this test, and
cannot tell the insurance company whether or not they should be
concerned about their current policies.
31
Example: One Tailed (Lower Tailed)
Trying to encourage people to stop driving to campus,
the university claims that on average it takes people 30
minutes to find a parking space on campus. John does
not think it takes so long to find a spot.
He calculated the mean time to find a parking space on
campus for the last five times and found it to be 20
minutes. Assuming that the time it takes to find a
parking spot is normally distributed, and that the
population standard deviation = 6 minutes, perform a
hypothesis test with level of significance alpha  = 0.10
to see if his claim is correct.
32
Example: One Tailed (Lower Tailed)
Step 1: Set the null and alternative
hypotheses
Step 2: Calculate the test statistic
Step 3: Set Rejection Region
Looking at the picture below, we need to put all of alpha in the left tail. Thus,
R : Z < -1.28
33
Example: One Tailed (Lower Tailed)
Step 4: Conclude
We can see that z=-3.727 < -1.28, thus our test
statistic is in the rejection region. Therefore we
reject the null hypothesis in favor of the
alternative.
We conclude that the mean is significantly less than 30, thus John has
proven that the mean time to find a parking space is less than 30.
34
Example: Two Tailed
A sample of 40 sales receipts from a grocery store
has mean = $137 and population standard
deviation = $30.2. Use these values to test
whether or not the mean in sales at the grocery
store are different from $150 with level of
significance alpha  = 0.01.
Step 1: Set the null and alternative
hypotheses
Step 2: Calculate the test statistic
35
Example: Two Tailed
Step 3: Set Rejection Region
Looking at the picture below, we need to put half of alpha in the left tail, and
the other half of alpha in the right tail. Thus, R : Z < -2.58 or Z > 2.58
Step 4: Conclude
We see that Z= -2.722 < -2.58, thus our test statistic is in
the rejection region. Therefore we reject the null
hypothesis in favor of the alternative. We can conclude that the
mean is significantly different from $150, thus I have proven that the mean
sales at the grocery store is not $150.
36
Example: credit manager
Lisa, the credit manager,
wants to check if the mean
monthly unpaid balance is
more than $400. The level of
significance she set is .05. A
random check of 172 unpaid
balances revealed the sample
mean to be $407. The
population standard deviation
is known to be $38.
Should Lisa conclude that
the population mean is
greater than $400, or is it
reasonable to assume that
the difference of $7 ($407$400) is due to chance? (at
confidence level 0.05)
37
Step 5
Make a decision and
interpret the results.
(Next page)
Step 4
H0 is rejected if
z > 1.65
(since  = 0.05)
Step 3
Since s is known, we can find the
test statistic z.
Step 1
H0: µ < $400
H1: µ > $400
Example: Lisa, the credit manager
Step 2
The significance
level is .05.
38
Step 5
Make a decision and
interpret the results.
oComputed
z
z of 2.42
> Critical z of 1.65,
op of .0078 <  of .05.
Reject H0.
X 
s
n

$407  $400
$38
172
 2.42
The p-value is .0078
for a one-tailed test.
(ref to informal ans.)
We can conclude that the mean unpaid
balance is greater than $400.
39
Limitation of z-scores in hypothesis testing
• The limitation of z-scores in hypothesis testing
is that the population standard deviation s (or
variance) must be known.
• What if you don’t know the µ and s of the
population?
• Answer: use the sample variability instead
40
Sample variance
s2 = sum of squares of deviation/ (n-1)
= sum of square of deviations/df
= SS/df
Since you must know the sample mean before you can
compute sample variance, this places a restriction on sample
variability such that only n-1 scores in a sample are free to
vary. The value n-1 is called the degrees of freedom (or df )
for the sample variance.
41
Z statistic
X 
z
s n
t statistic
Unknown s
X 
t
s n
If you select all the possible samples of a particular size
(n), the set of all possible t statistics will form a t
distribution.
Good for: (i) large sample n>30, with the
underlying distribution may or may not be Normal
(ii) small sample n<30 with the
underlying distribution is Normal
42
Distributions of the t statistic for different values of
degrees of freedom are compared to a normal
distribution.
43
44
45
46
The t distribution with df = 3. Note that 5% of the
distribution is located in the tails t>2.353 and t<2.353.
47
The label on Fries’ Catsup indicates that the bottle contains
16 ounces of catsup.
A sample of 36 bottles from last hour’s production
revealed a mean weight of 16.12 ounces per bottle
and a sample standard deviation of 0.5 ounces. At
the 0.05 significance level, test if the process out of
control? That is, can we conclude that the mean
amount per bottle is different from 16 ounces?
48
Step 5
Make a decision and
interpret the results.
(Next page)
Step 4
State the decision rule.
Reject H0 if z > 1.96
or z < -1.96 (since  = 0.05)
Step 3
Since the sample size is large enough
and the population s.d. is unknown, we
can use the test statistic is t.
Step 1
State the null and the
alternative hypotheses
H0:  = 16
H1:   16
Step 2
Select the significance level.
The significance level is .05.
49
Step 5: Make a decision and
interpret the results.
t
X 
s
n

16.12  16.00
0.5
36
 1.44
The p-value is .1499
for a two-tailed test.
oComputed
z of 1.44
We cannot
< Critical z of 1.96,
conclude the
op of .1499 >  of .05,
mean is different
Do not reject the null hypothesis.
from 16 ounces.
50
Testing for a Population Mean: Unknown
(Population) standard deviation , Small sample.
But the underlying distribution is Normal
The test statistic is the t distribution.
The critical value of t is
determined by its degrees of
freedom which is equal to n-1.
t
X 
s/ n
51
The current rate for producing 5 amp fuses at a Electric Co. is 250
per hour. A new machine has been purchased and installed.
According to the supplier, the production rate are normally
A sample of 10 randomly selected
hours from last month revealed that the mean
hourly production was 256 units, with a sample
s.d. of 6 per hour.
distributed.
At the 0.05 significance level, test if
the new machine is faster than the
old one?
52
Step 4
State the decision rule.
degrees of freedom = 10 – 1
=9
. Reject H0 if t > 1.833
Step 3
Since the underlying
distribution is normal, s is
unknown, use the t
distribution.
Step 1
State the null and
alternate hypotheses.
H0: µ < 250
H1: µ > 250
Step 2
Select the level of
significance. It is .05.
53
Step 5
Make a decision and
interpret the results.
t
X 
s
n

256  250
6 10
 3.162
The p-value is 0.0058.
(obtained from t, need a
software to find it.)
oComputed
t of 3.162
>Critical t of 1.833
op of .0058 < alpha of .05
Reject Ho
The mean number of
fuses produced is more
than 250 per hour.
If the p-value is less than alpha , then reject the null
hypothesis.
54
Example: One-sample hypothesis test for mean
• Amount of time UIC students spend in library
from survey
– Mean 41.72 minutes
– Standard deviation 40.179 minutes
– Number of cases 294
• National survey finds university library users
spend mean of 38 minutes
• Is population mean for UIC Library users
different from national mean?
Step 1. Hypotheses
• Null hypothesis
H0: μ = μ0
μ = 38
• Alternative or research hypothesis
Ha: μ ≠ μ0 μ ≠ 38
Step 2. Level of significance
• Probability of error in making decision to
reject null hypothesis
• For this test choose
α = 0.05
Region of
Region of
Do not
rejection
rejection
reject
[Probability=.025]
[Probability =.95]
-1.96
Critical value
0
[Probability=.025]
1.96
Critical value
Step 3. Test statistic
y  0
41.72  38
t

 1.588
s / n 40.179 / 294
Region of
Region of
Do not
rejection
rejection
reject
[Probability=.025]
[Probability =.95]
-1.96
Critical value
0
[Probability=.025]
1.96
Critical value
n = 294 so use
critical t values
from table for
infinity.
4. Decision
• Cannot reject the null hypothesis
• Cannot conclude that population mean is
different from 38 minutes
95% confidence Interval in this example:
E=1.96*
=4.59
[41.72-4.59, 41.72+4.59] or [37.13, 46.31]
Confidence interval and hypothesis
test for library example
• Confidence interval for time spent in library is
37.13 < μ < 46.31
• Hypothesized value of 38 minutes falls within
confidence interval
• Therefore we cannot say that population
mean is not equal to 38 minutes, cannot reject
the null hypothesis
Using confidence intervals or
hypothesis tests
• For parameters for a single sample…
– One-sample hypothesis test involves comparison
with pre-specified value…
– Which is often artificial…
– So confidence interval most appropriate for
reporting results
• For parameters for two samples…
– Difference in parameters is of interest
– Hypothesis test examines directly
– Confidence interval less intuitive
Confidence interval
or Hypothesis test?
• Hypothesis tests are better when the chief issue is
to make a yes/no decision about whether a pattern
exists in a population.
• Confidence intervals are better when the chief issue
is to make a best guess of a population parameter.
When reading a scientific journal, you typically will not be told explicitly that the
researcher evaluated the data using a z-score as a test statistic with an alpha level
of .05. Nor will you be told that “the null hypothesis is rejected.” Instead, you will see
a statement such as:
The treatment with medication had a significant effect on
people’s depression scores, z = 3.85, p < .05.
Let us examine this statement piece by piece.
First, what is meant by the term significant?
In statistical tests, this word indicates that the result is different from what
would be expected due to chance. A significant result means that the null
hypothesis has been rejected. That is, the data are significant because the
sample mean falls in the critical region and is not what we would have
expected to obtain if H0 were true.
Next, what is the meaning of z = 3.85? The z indicates that a z-score was used
as the test statistic to evaluate the sample data and that its value is 3.85.
63
Finally, what is meant by p< .05? This part of the
statement is a conventional way of specifying the alpha
level that was used for the hypothesis test. More
specifically, we are being told that an outcome as
extreme as the result of the experiment would occur
by chance with a probability (p) that is less than .05
(alpha) if H0 were true.
64
In circumstances where the statistical decision is to fail to reject
H0, the report might state that
There was no evidence that the medication had an effect on
depression scores, z=1.30, p> .05.
In this case, we are saying that the obtained result, z= 1.30, is
not unusual (not in the critical region) and is relatively likely to
occur by chance (the probability is greater than .05).
Thus, H0 was not rejected.
65
Using the p-Value in Hypothesis Testing
p-value does not only tell us whether we should reject H0, but
also tell us how confident we are to reject it.
If the p-Value  a, H0 cannot be rejected.
If the p-Value < a, H0 is rejected.
Sample means that fall in the critical
region (shaded areas) have a
probability less than alpha. H0 should
be rejected.
66
More Example: To test the effectiveness of eye-spot patterns in
deterring predation, a sample of n=16 insectivorous birds is selected. The
animals are tested in a box that has two separate chambers (see figure).
The birds are free to roam from one chamber to another through a
doorway in a partition. On the wall of one chamber, two large eye-spot
patterns have been painted. The other chamber has plain walls. The birds
are tested one at a time by placing them in the doorway in the center of
the apparatus.
Each animal is left in the box for 60 minutes,
and the amount of time spent in the plain
chamber is recorded. Suppose that the sample
of n=16 birds spent an average m of 39
minutes in the plain side, with SS=540. Can we
conclude that eye-spot patterns have an effect
on behavior? Note that we have no
information about the population variance.
67
Step 1: State the hypotheses : H0: µplain side = 30 minutes
Step 2: Locate the critical region. The test statistic is a t
statistic because the population variance is not known.
df=16-1=15
For a two-tailed test at the .05 level of significance and with 8
degrees of freedom, the critical region consists of t values
greater than +2.131 or less than -2.131
Step 3: Calculate the test statistic
s2 = SS/df = 540/15 = 36
sm = sqrt(s2 /16) = 1.5
the t statistic t=(39-30)/1.5=6
Step 4: Make a decision – reject H0
68
The critical region in the t distribution for alpha= .05
and df=15.
69
HYPOTHESIS TESTING for:
population proportions
70
Example: Survey data on attitudes toward
income inequality
• Imagine that we would like to find out if US adults had some net
opinion on the following issue.
• “Do you think it should or should not be the government’s
responsibility to reduce income differences between the rich and
the poor?”
•
•
•
•
Score
Response
1
should be
0
should not be 636
Total n = 1227
Number
591
Survey data on attitudes toward
income inequality
• 0: Assumptions: we will be doing a large-sample
test for population proportions. To perform this
test, we must assume that…
– Sample size is large enough that np(1-p) > 10
– The sample is a random sample of some sort
– The variable is a discrete interval-scale variable, which is
automatically true for population proportions.
Survey data on attitudes toward
income inequality
• 1: Hypothesis: let  denote the population
proportion who favor government intervention to
alleviate income inequality.
• Our null hypothesis is that the population, on
average, neither supports nor opposes government
intervention.
– Ho:  = 0.5
• The alternate hypothesis is then
– HA:   0.5
Survey data on attitudes toward
income inequality
• 2: Test Statistic: For an n of 1227 respondents, we
calculate the following statistics:
–P
= n(yes)/n(total) = 591/1227 = .4817
– σ0
= SQRT(o(1- o)) = .5
– SE
= σ0 / SQRT(n) = .01427
–z
= (P - o ) / s.e.
= (.4817 - .500) / .01427
= -1.282
• The z-statistic is the test statistic of interest in a largesample test of a population proportion.
.
.
Survey data on attitudes toward
income inequality
3. Pick α = 0.05 & determine critical z
Region of
Region of
Do not
rejection
rejection
reject
[Probability=.025]
[Probability =.95]
[Probability=.025]
-1.282
-1.96
Critical value
0
1.96
Critical value
Survey data on attitudes toward
income inequality
• 4: Conclusion: Therefore, we do not reject the
hypothesis that the population proportion is .5