Download MA 320 Lecture Notes - Unit 8

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
UNIT 8: STATISTICAL HYPOTHESIS TESTING
8.1. Introduction
This chapter describes the statistical procedure for testing the hypotheses, which is a
very standard procedure that is commonly used by professionals in a wide variety of
disciplines. The two major activities of inferential statistics are the estimation of
population parameters and hypothesis testing. Hypothesis testing is important because
it provides an objective framework for making decisions using probabilistic methods
rather than relying on subjective impressions.
Definition 8.1.1
A statistical hypothesis is a claim or statement about the property of a population. It
is an assertion or conjecture concerning one or more populations. A hypothesis test
(or test of significance) is a standard procedure for testing a claim about the property
of a population.
Example 8.1.2
The following statements are typical of the hypotheses (claims) that can be tested by
the procedures we will develop later in this chapter:
(i) A computer Engineer claims that the life span of computers produced by a
certain Zambian company is less 10 years
(ii) Medical researchers claim that the mean body temperature of healthy adults is
less than 38℃
(iii)A food company produces peanuts weighing 336g (on average). Periodically,
the quality control department takes samples of peanut packets to determine
whether the packaging process is under control.
(iv) Mendel claims that under certain circumstances, the percentage of off-spring
peas with yellow pods exceeds 25%.
Before beginning to study this chapter, we should bare the following rule in mind:
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
1
Rare Event Rule for Inferential Statistics
If, under a given assumption, the probability of a particular observed event is exceptionally
small, we conclude that the assumption is probably not correct.
Following this rule, we test the claim by analyzing the sample data in an attempt to
distinguish between results that can easily occur by chance and results that are highly
unlikely to occur by chance.
8.2. Basics of Hypothesis Testing
In this section, we describe the formal components used in hypothesis testing: null
hypothesis, alternative hypothesis, test statistic, critical region, significance level, critical
value, 𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’, type I error, and type II error. In other words, the objectives of this
section are as follows:
ο‚·
Given a claim, identify the null hypothesis and the alternative hypothesis, and
express both in symbolic form.
ο‚·
Given a claim and sample data, calculate the value of the test statistic
ο‚·
Given a significance level, identify the critical value(s)
ο‚·
Given a value of the test statistic, identify the p – value.
ο‚·
State the conclusion of a hypothesis test in simple, non-technical terms
ο‚·
Identify the type I error and type II error that could be made when testing a given
claim.
Null and Alternative Hypotheses
Usually, a hypothesis takes the form of a claim, a belief or suspicion.
The null hypothesis (denoted by 𝐻0 ) is a statement that the value of a population
parameter (such as proportion, mean or standard deviation) is equal to some claimed
value. The symbolic statement of the null hypothesis usually contains an equal (=) sign.
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
2
In this sense, we can say that the null hypothesis is a statement asserting no change, no
effect or no difference.
The alternative hypothesis (denoted by 𝐻1 π‘œπ‘Ÿ π»π‘Ž ) is the statement that the parameter
has a value that somehow differs from the null hypothesis. For the methods of this chapter
the symbolic form of the alternative hypothesis must use one of the symbols < π‘œπ‘Ÿ >
π‘œπ‘Ÿ β‰ .
Below are some typical null and alternative hypotheses in relation to Example 8.1.2.
(𝑖)𝐻0 : πœ‡ = 10
(𝑖𝑖)𝐻0 : πœ‡ = 38℃
(𝑖𝑖𝑖)𝐻0 : πœ‡ = 336𝑔
(𝑖𝑣)𝐻0 : 𝑃 = 0.25
𝐻1 : πœ‡ < 10
𝐻1 : πœ‡ < 38℃
𝐻1 : πœ‡ β‰  336𝑔
𝐻1 : 𝑃 > 0.225
It should be noted that the above are examples of one-tailed (one – sided) tests while
except for (iii) which is a two - tailed (two - sided) test. In two tailed tests, the level of
significance,  is divided equally between the two tails that constitute the critical region.
Caution!
1. Although some text books use the symbols such as ≀ π‘Žπ‘›π‘‘ β‰₯ in the null hypothesis
(𝐻0 ), most professional journals use only the equal sign and that is what is
recommended in this text. We conduct the hypothesis test by assuming that the
proportion, mean or variance is equal to some specified value so that we can work
with a single distribution having a specific value.
2. If you are conducting a study and want to use a hypothesis test to support your
claim, the claim must be worded so that it becomes the alternative hypothesis. This
requires that your claim must be expressed using the symbols < π‘œπ‘Ÿ > π‘œπ‘Ÿ β‰ . You
cannot use a hypothesis test to support a claim that some parameter is equal to
some specified value.
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
3
Test Statistic
The test statistic is a value computed from the sample data, and it is used in making the
decision about the rejection of the null hypothesis. It is found by converting the sample
statistic (such as the sample proportion, 𝑝̂ , or the sample mean, π‘₯Μ… or the sample variance,
𝑠 2 ) to a score such as 𝑧, 𝑑 π‘œπ‘Ÿ 2 with the assumption that the null hypothesis is true. The
test statistic can therefore be used for determining whether there is significant evidence
against the null hypothesis.
Critical region
The critical (rejection) region is the set of all values of the test statistic that cause us to
reject the null hypothesis. In other words, the critical region contains the critical value at
a given level of significance (and with specified degrees of freedom in a case of the t –
test, chi square test or F- test). A critical value is any value that separates the critical
region (where we reject the null hypothesis) from the values of the test statistic that do
not lead to the rejection of the null hypothesis. The critical value, also called the table
value depends on the nature of the null hypothesis, the sampling distribution that applies,
and the significance level.
When a decision is made about the null hypothesis, there are two possible errors that
may be committed.
(i)
Type I error: Rejection of the null hypothesis when it is actually true
(ii)
Type II error: Acceptance of the null hypothesis when it is false
The table below gives a summary of the possible situations:
We decide to reject 𝐻0
Decision
We fail to reject 𝐻0
True State of Nature (Reality)
𝐻0 𝑖𝑠 π‘“π‘Žπ‘™π‘ π‘’
𝐻0 𝑖𝑠 π‘‘π‘Ÿπ‘’π‘’
Correct decision made
Type I error
committed
Type II error
Correct decision made
committed
Level of significance
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
4
The level of significance (denoted by ) is the probability that the test statistic will fall in
the critical region when the null hypothesis is actually true. In other words,  = P(type I
error) and the probability of type II error is given by  = P(type II error). 1 -  is the
probability of rejecting 𝐻0 when it is false and it is called the power of the test. If the level
of significance is not specified, then it is mandatory to use 5%.
Probability value (p – value)
This is the probability of getting a value of the test statistic that is at least as extreme as
the one representing the sample data, assuming that the null hypothesis is true. The null
hypothesis is rejected if the p – value is very small (i.e. when p – value < ).
Decisions and conclusions
From example 8.1.2, we have seen that the original claim sometimes becomes the
alternative hypothesis. However, the standard procedure of hypothesis testing requires
that we always test the null hypothesis and so, the initial conclusion will always be one of
the following:
(i)
Reject the null hypothesis or
(ii)
Fail to reject the null hypothesis
The decision to reject or fail to reject the null hypothesis is usually made using either
the traditional (classical) method, p – value method or based on confidence intervals.
In recent years however, use of the traditional method has been declining partly
because statistical software packages are often designed for the p – value method.
(a) Traditional method: Reject 𝐻0 if the absolute value of the test statistic is greater
than the critical value. That is, if the test statistic falls within the critical region.
(b) P- value method: Reject 𝐻0 if 𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ < 𝛼
(c) Confidence intervals: Because a confidence interval estimate of a population
parameter contains the likely values of that parameter, reject a claim that the
population parameter has a value that is not included in the confidence interval.
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
5
The conclusion of rejecting or failure to reject 𝐻0 is fine for those who have done a statistic
course, but it is important to use simple and non-technical terms in stating what the
conclusion really means. The figure below gives a summary of the wording of the final
conclusion.
Start
Wording of final
conclusion
Does the
original claim
contain the
condition of
equality?
Yes
Do you
reject
𝐻0 ?
Yes
No (fail to reject H0)
No (original
claim does not
contain
equality, so it
becomes H1)
Do you
reject
𝐻0 ?
Yes
No (fail to reject H0)
There
is
sufficient
evidence to warrant
rejection of the claim
that…………
There is no sufficient
evidence to warrant
rejection of the claim
that …..
The sample data
support the claim
that………
There is not sufficient
sample evidence to
support the claim
that…….
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
6
8.3. Tests about one population parameter
Now that we have understood the meanings of key concepts regarding hypothesis testing,
we can discuss the various tests that can be carried out regarding one population mean,
one population proportion and one population variance. The table below gives a summary
of the test statistics used in each case.
Parameter
Population
mean
Hypotheses
H0:  = 0 against
H1:  ο€Ύ 0 or  ο€Ό 0 or
 ο‚Ή 0
Condition(s)
Test statistic
Population variance
known or population
variance unknown
with n ο€Ύ 30
Unknown population
variance with n ο‚£ 30
𝑧=
π‘₯Μ… βˆ’ πœ‡0
or
𝜎/βˆšπ‘›
𝑧=
π‘₯Μ… βˆ’ πœ‡0
𝑆/βˆšπ‘›
π‘₯Μ… βˆ’ πœ‡0
𝑑=
𝑆/βˆšπ‘›
Decision
Reject H0 if
|𝑧| > 𝑧𝛼 for a 1 – tailed
test and reject H0 if |𝑧| >
𝑧𝛼/2 for a 2 – tailed test
Reject H0 if
|𝑑| > 𝑑𝛼, π‘›βˆ’1 for a 1 –
tailed test and reject H0 if
|𝑑| > 𝑧𝛼,π‘›βˆ’1 for a 2 –
2
Population
proportion
H0: P = p against
H1: P ο€Ύ p or P ο€Ό p or
Pο‚Ήp
Population
variance
𝐻0 : 𝜎 2
𝐻0 : 𝜎 2
𝐻0 : 𝜎 2
𝐻0 : 𝜎 2
=
>
<
β‰ 
𝜎0 2 against
𝜎0 2 or
𝜎0 2 or
𝜎0 2
𝑧=
𝑝̂ βˆ’ 𝑝
√
2 =
π‘π‘ž
𝑛
(𝑛 βˆ’ 1)𝑆 2
𝜎2
tailed test
Reject H0 if
|𝑧| > 𝑧𝛼 for a 1 – tailed
test and reject H0 if |𝑧| >
𝑧𝛼/2 for a 2 – tailed test
Reject H0 if
|2 | > 2 𝛼,π‘›βˆ’1 for a 1 –
tailed test and reject H0 if
|2 | > 2 𝛼,π‘›βˆ’1 for a 2 –
2
tailed test
If one needs to make a decision using a p – value method, then it should be noted that
𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = 𝑃(𝑍 > |𝑧|)π‘“π‘œπ‘Ÿ π‘Ž π‘œπ‘›π‘’ π‘‘π‘Žπ‘–π‘™π‘’π‘‘ π‘Žπ‘›π‘‘ 𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = 2𝑃(𝑍 > |𝑧|) π‘“π‘œπ‘Ÿ π‘Ž 2 π‘‘π‘Žπ‘–π‘™π‘’π‘‘ 𝑑𝑒𝑠𝑑.
In a case of a t – test,
𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = 𝑃(π‘‡π‘›βˆ’1 > |𝑑|)π‘“π‘œπ‘Ÿ π‘Ž π‘œπ‘›π‘’ π‘‘π‘Žπ‘–π‘™π‘’π‘‘ π‘Žπ‘›π‘‘ 𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = 2𝑃(π‘‡π‘›βˆ’1 > |𝑑|)
For chi – square test,
𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = 𝑃(2 π‘›βˆ’1 > |2 |) π‘Žπ‘›π‘‘ 𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = 2𝑃(2 π‘›βˆ’1 > |2 |) π‘“π‘œπ‘Ÿ π‘Ž 2 π‘‘π‘Žπ‘–π‘™π‘’π‘‘ 𝑑𝑒𝑠𝑑.
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
7
Example 8.3.1
1. A random sample of 100 deaths recorded in the US during the past year showed
an average life span of 71.8 years. Assuming a population standard deviation of
8.9 years, does this seem to indicate that the average life span today is greater
than 70 years? Use 0.05 level of significance.
Working
Hypotheses;
𝐻0 : πœ‡ = 70, π‘Žπ‘›π‘‘ 𝐻1 : πœ‡ > 70
Level of significance;
𝛼 = 0.05
Test statistic;
Since the sample size is greater than 30 and population variance is known, we use z as
a test statistic.
𝑧=
π‘₯Μ… βˆ’ πœ‡0
𝜎/βˆšπ‘›
=
71.8 βˆ’ 70
= 2.022
8.9
√100
Critical value;
𝑍𝛼 = 𝑍0.05 = 1.645
P – Value (optional);
𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = 𝑃(𝑍 > |𝑧|) = 𝑃(𝑍 > |𝑧|) = 0.0217
Decision;
Since, |𝑧| > 𝑍𝛼 , we reject 𝐻0
Note that we could have made a decision using p – value method (i.e. since p - value is
less than the level of significance, we reject Ho).
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
8
Conclusion;
Rejecting 𝐻0 at 5% level of significance indicates that the sample data supports the claim
that the average life span today is greater than 70 years.
2. The Zambian Heart Association recommends that an individual’s cholesterol level
be under 200mg per 100ml. The following are the cholesterol readings of 16
women selected randomly from the Kitwe Herat Study:
233 197
192 179 174 217
188 209
196 167
186 221
238 179 196 191
At the 10% level of significance, do these readings suggest that women in Kitwe
have cholesterol readings below 200 mg on average? What assumptions are
required?
Working
From the given data set, π‘₯Μ… = 197.6875 π‘Žπ‘›π‘‘ 𝑠 = 20.7066
Hypotheses;
𝐻0 : πœ‡ = 200, π‘Žπ‘›π‘‘ 𝐻1 : πœ‡ < 200
Level of significance;
𝛼 = 0.1
Since the sample size is less than 30 and population variance is unknown, we use t as a
test statistic.
𝑑=
π‘₯Μ… βˆ’ πœ‡0
𝑆/βˆšπ‘›
=
197.6875 βˆ’ 200
= βˆ’0.4467 οƒž |𝑑| = 0.4467
20.7066
√16
Critical value;
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
9
𝑑𝛼,π‘›βˆ’1 = 𝑑0.1,15 = 1.341 οƒž |𝑑| < 𝑑𝛼,π‘›βˆ’1
P – Value (optional);
𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = 𝑃(𝑇15 > |𝑑|) = 𝑃(𝑇15 > 0.4467), giving 𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ > 0.25 οƒž 𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ > 0.1
and so, 𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ > 𝛼.
Decision;
Based on either the critical value (traditional) method or p – value method, we fail to reject
Ho.
Conclusion;
Failure to nullify Ho at 10% level of significance indicates that there is no sufficient sample
evidence to support the claim that women in Kitwe district have cholesterol readings
below 200mg on average.
3. A manufacturer of sports equipment has developed a new synthetic fishing line
that he claims has a mean breaking strength of 8kg with a standard deviation of
0.5kg. A random sample of 50 lines is tested and found to have a mean breaking
strength of 7.8kg. Is the claim valid at 1% level of significant?
Working
From the given data set, π‘₯Μ… = 7.8, πœ‡ = 8, 𝜎 = 0.5 π‘Žπ‘›π‘‘ 𝑛 = 50
Hypotheses;
𝐻0 : πœ‡ = 8, π‘Žπ‘›π‘‘ 𝐻1 : πœ‡ β‰  8
Level of significance;
𝛼 = 0.01
Since the sample size is greater than 30 and population variance is known, we use
𝑧=
π‘₯Μ… βˆ’ πœ‡0
𝜎/βˆšπ‘›
=
7.8 βˆ’ 8
= βˆ’2.83 οƒž |𝑧| = 2.83
0.5
√50
Critical value;
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
10
𝑍𝛼 = 𝑍0.005 = 2.576 οƒž |𝑧| > 𝑍𝛼
2
2
P – Value (optional);
𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = 2𝑃(𝑍 > |𝑧|) = 2𝑃(𝑍 > 2.83) = 0.0046, οƒž 𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ < 0.01 and so, 𝑝 βˆ’
π‘£π‘Žπ‘™π‘’π‘’ < 𝛼.
Decision;
Based on either the critical value (traditional) method or p – value method, we reject
Ho.
Conclusion;
Rejecting Ho at 1% level of significance indicates that there is no sufficient evidence
to support the claim that the mean breaking strength was 8kg.
4. A distributor of cigarettes claims that 20% of the smokers in Myami prefer Kent
cigarettes. To test the claim, 20 smokers are selected at random and asked what
brand they prefer. If 6 of the 20 named Kent as their preference, what conclusion
do we draw?
Working
In this case, we need to carry out the test concerning proportion.
𝑝 = 20% = 0.2 οƒž π‘ž = 1 βˆ’ 0.2 = 0.8 π‘Žπ‘›π‘‘ 𝑝̂ =
6
= 0.3
20
Hypotheses;
𝐻0 : 𝑃 = 0.2, π‘Žπ‘›π‘‘ 𝐻1 : 𝑃 β‰  0.2
Level of significance;
𝛼 = 0.05
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
11
Test statistic;
𝑧=
𝑝̂ βˆ’ 𝑝
π‘π‘ž
√
𝑛
=
0.3 βˆ’ 0.2
√(0.2)(0.8)
20
= 1.12
Critical value;
𝑍0.05 = 𝑍0.025 = 1.96 οƒž |𝑍| < 𝑍𝛼
2
2
P – Value (optional);
𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = 2𝑃(𝑍 > |𝑧|) = 2𝑃(𝑍 > 1.12) = 0.2628, οƒž 𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ > 0.05
and
so,
π‘βˆ’
π‘£π‘Žπ‘™π‘’π‘’ > 𝛼.
Decision;
Based on either the critical value (traditional) method or p – value method, we fail to reject
Ho.
Conclusion;
Failure to reject Ho at 5% level of significance indicates that there is no sufficient evidence
to invalidate the claim that 20% of the smokers in Myami prefer Kent cigarettes.
5. A Cafein content of a certain brand of tea is known to be normally distributed with
variance of 1.3 mg. Test this claim using a random sample of 8 packets of tea with
standard deviation of 1.8mg at 5% level of significance.
Working
In this case, 𝜎 2 = 1.3, 𝑆 = 1.8 π‘Žπ‘›π‘‘ 𝑛 = 8
Hypotheses;
𝐻0 : 𝜎 2 = 1.3, π‘Žπ‘›π‘‘ 𝐻1 : 𝜎 2 β‰  1.3
Level of significance;
𝛼 = 0.05
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
12
Test statistic;
2 =
(𝑛 βˆ’ 1)𝑆 2
(7)1.82
=
= 17.446
𝜎2
1.3
Critical value;
2 𝛼,π‘›βˆ’1 = 2 0.025,
2
7
= 16.013 οƒž |2 | > 2 𝛼,π‘›βˆ’1
2
Decision;
Since |2 | > 2 𝛼,π‘›βˆ’1 , we reject Ho
2
Conclusion;
Rejecting Ho at 5% level of significance indicates that the variance is not equal to 1.3.
8.4. Confidence intervals and hypothesis testing
The testing of Ho:  = o against H1:  ο‚Ή o at % level of significance is equivalent to
computing a (100(1 - ) % confidence interval for . In this case, Ho is rejected if o is not
inside the confidence interval. If o is inside the confidence interval then the null
hypothesis is not rejected.
If we consider question 3 in example 3.3.1 then,
𝐻0 : πœ‡ = 8, π‘Žπ‘›π‘‘ 𝐻1 : πœ‡ β‰  8
At 1% level of significance, we can construct a 99% confidence interval as follows:
π‘₯Μ… ± 𝑍𝛼/2
𝜎
βˆšπ‘›
= 7.8 ± 𝑍0.005
0.5
√50
= (7.62, 7.98)
Since 8  (7.62, 7.98), we reject Ho at 1% level of significance and conclude that the
manufacturer’s claim is not valid.
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
13
8.5. Tests about two population parameters
In this section, we shall only concentrate on the test statistics for each of the three
parameters (mean, proportion and variance). The procedure for carrying out the test
remains the same as we did in the previous section.
8.5.1. Two population means
Test statistics and critical regions for two population means can be summarised as
follows:
To test 𝐻0 : πœ‡1 βˆ’ πœ‡2 = πœ‡0 against 𝐻1 : πœ‡1 βˆ’ πœ‡2 β‰  πœ‡0 or 𝐻1 : πœ‡1 βˆ’ πœ‡2 > πœ‡0 or 𝐻1 : πœ‡1 βˆ’ πœ‡2 <
πœ‡0 , use;
(𝑖)𝑍 =
π‘₯Μ…1 βˆ’ π‘₯Μ…1 βˆ’ πœ‡0
√
𝜎1 2 𝜎2 2
𝑛1 + 𝑛2
𝑖𝑓 π‘π‘œπ‘‘β„Ž 𝜎1 2 π‘Žπ‘›π‘‘ 𝜎2 2 π‘Žπ‘Ÿπ‘’ π‘˜π‘›π‘œπ‘€π‘› π‘Žπ‘›π‘‘ 𝑍 =
π‘₯Μ…1 βˆ’ π‘₯Μ…1 βˆ’ πœ‡0
𝑠1 2 𝑠 2
𝑛1 + 𝑛2
√
𝑖𝑓 𝜎1 2 π‘Žπ‘›π‘‘ 𝜎2 2 π‘Žπ‘Ÿπ‘’ π‘’π‘›π‘˜π‘›π‘œπ‘€π‘› π‘Žπ‘›π‘‘ π‘π‘œπ‘‘β„Ž π‘ π‘Žπ‘šπ‘π‘™π‘’ 𝑠𝑖𝑧𝑒𝑠 π‘Žπ‘Ÿπ‘’ π‘”π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘Ÿ π‘‘β„Žπ‘Žπ‘› 30.
(𝑖𝑖) 𝑑 =
π‘₯Μ…1 βˆ’ π‘₯Μ…1 βˆ’ πœ‡0
𝑖𝑓 𝜎1 2 π‘Žπ‘›π‘‘ 𝜎2 2 π‘Žπ‘Ÿπ‘’ π‘’π‘›π‘˜π‘›π‘œπ‘€π‘› 𝑏𝑒𝑑 π‘Žπ‘ π‘ π‘’π‘šπ‘’π‘‘ π‘‘π‘œ 𝑏𝑒 π‘’π‘žπ‘’π‘Žπ‘™.
1
1
𝑛1 + 𝑛2 )
𝐼𝑛 π‘‘β„Žπ‘–π‘  π‘π‘Žπ‘ π‘’, π‘‘β„Žπ‘’ π‘‘π‘’π‘”π‘Ÿπ‘’π‘’π‘  π‘œπ‘“ π‘“π‘Ÿπ‘’π‘’π‘‘π‘œπ‘š, 𝛾 = 𝑛1 + 𝑛2 βˆ’ 2
(𝑛1 βˆ’ 1)𝑆1 2 + (𝑛2 βˆ’ 1)𝑆2 2
2
π‘Žπ‘›π‘‘ π‘‘β„Žπ‘’ π‘π‘œπ‘œπ‘™π‘’π‘‘ π‘£π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’, 𝑆𝑝 =
𝑛1 + 𝑛2 βˆ’ 2
βˆšπ‘†π‘ 2 (
(𝑖𝑖𝑖) 𝑑 =
π‘₯Μ…1 βˆ’ π‘₯Μ…1 βˆ’ πœ‡0
2
𝑠2
𝑠1
𝑛1 + 𝑛2
√
𝑖𝑓 𝜎1 2 π‘Žπ‘›π‘‘ 𝜎2 2 π‘Žπ‘Ÿπ‘’ π‘’π‘›π‘˜π‘›π‘œπ‘€π‘› π‘Žπ‘›π‘‘ π‘›π‘œπ‘‘ π‘’π‘žπ‘’π‘Žπ‘™.
𝐼𝑛 π‘‘β„Žπ‘–π‘  π‘π‘Žπ‘ π‘’, π‘‘β„Žπ‘’ π‘‘π‘’π‘”π‘Ÿπ‘’π‘’π‘  π‘œπ‘“ π‘“π‘Ÿπ‘’π‘‘π‘œπ‘š 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛 𝑏𝑦; 𝛾 =
2
𝑠 2 𝑠 2
( 𝑛1 + 𝑛2 )
1
2 2
𝑠
( 𝑛1 )
1
2
2
𝑠 2
( 𝑛2 )
2
𝑛1 βˆ’ 1 + 𝑛2 βˆ’ 1
For paired observations, the mean difference is tested using a t- test with the test value
given by;
𝑑=
Μ… βˆ’ πœ‡0
𝐷
π‘€π‘–π‘‘β„Ž π‘‘β„Žπ‘’ π‘‘π‘’π‘”π‘Ÿπ‘’π‘’π‘  π‘œπ‘“ π‘“π‘Ÿπ‘’π‘’π‘‘π‘œπ‘š 𝑔𝑖𝑣𝑒𝑛 𝑏𝑦 𝛾 = 𝑛 βˆ’ 1.
𝑆𝑑
βˆšπ‘›
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
14
Example 8.5.1
1. An experiment was performed to compare the abrasive wear of two laminated
materials. 12 pieces of material 1 gave an average (coded) wear of 85 units with
a standard deviation of 4 while 10 pieces of material 2 gave an average of 81 and
a standard deviation of 5. Can we conclude that the abrasive wear of material 1
exceeds that of material 2 by more than 2 units? Assume that the populations are
approximately normal with equal variances.
2. Business schools A and B reported the following summary of GMAT (Graduate
Management Apptitude Test) verbal scores.
School
A
B
Sample size(n)
201
115
Sample mean
34.75
33.74
Sample variance
48.59
30.68
At 5% level of significance, is there sufficient evidence to believe that there is a
difference in the GMAT scores of the two schools? Calculate the p – value
associated with this test.
Working
1. Let πœ‡1 π‘Žπ‘›π‘‘ πœ‡2 𝑏𝑒 π‘π‘œπ‘π‘’π‘™π‘Žπ‘‘π‘–π‘œπ‘› π‘šπ‘’π‘Žπ‘›π‘  π‘“π‘œπ‘Ÿ π‘šπ‘Žπ‘‘π‘’π‘Ÿπ‘–π‘Žπ‘™π‘  1 π‘Žπ‘›π‘‘ 2 π‘Ÿπ‘’π‘ π‘π‘’π‘π‘‘π‘–π‘£π‘’π‘™π‘¦. Then
𝐻0 : πœ‡1 βˆ’ πœ‡2 = 2 π‘Žπ‘›π‘‘ 𝐻1 : πœ‡1 βˆ’ πœ‡1 > 2
Since population variances are unknown but assumed to be equal, then
𝑑=
π‘₯Μ…1 βˆ’ π‘₯Μ…1 βˆ’ πœ‡0
βˆšπ‘†π‘
2
1
1
(𝑛 + 𝑛 )
1
2
π‘€β„Žπ‘’π‘Ÿπ‘’ 𝑆𝑝
οƒžπ‘‘ =
2
(𝑛1 βˆ’ 1)𝑆1 2 + (𝑛2 βˆ’ 1)𝑆2 2
=
= 20.05
𝑛1 + 𝑛2 βˆ’ 2
85 βˆ’ 81 βˆ’ 2
√20.05 ( 1 + 1 )
12 10
= 1.043
𝑑0.05,20 = 1.725 οƒž |𝑑| < 𝑑0.05,20
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
15
In this case, we fail to reject Ho at 5% level of significance and conclude that there is no
sufficient evidence to say that the abrasive wear of material 1 exceeds that of material 2
by more than two units.
𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = 2𝑃(𝑍 > 1.42) = 2(0.0778) = 𝟎. πŸπŸ“πŸ“πŸ”
2. Let πœ‡1 π‘Žπ‘›π‘‘ πœ‡2 𝑏𝑒 π‘π‘œπ‘π‘’π‘™π‘Žπ‘‘π‘–π‘œπ‘› π‘šπ‘’π‘Žπ‘› π‘ π‘π‘œπ‘Ÿπ‘’π‘  π‘“π‘œπ‘Ÿ π‘ π‘β„Žπ‘œπ‘œπ‘™π‘  𝐴 π‘Žπ‘›π‘‘ 𝐡 π‘Ÿπ‘’π‘ π‘π‘’π‘π‘‘π‘–π‘£π‘’π‘™π‘¦. Then
𝐻0 : πœ‡1 βˆ’ πœ‡2 = 0 π‘Žπ‘›π‘‘ 𝐻1 : πœ‡1 βˆ’ πœ‡1 β‰  0
In this case, both sample sizes are greater than 30, so we use;
𝑍=
π‘₯Μ…1 βˆ’ π‘₯Μ…1 βˆ’ πœ‡0
= 1.42 π‘Žπ‘›π‘‘ 𝑍𝛼/2 = 1.96
𝑠 2 𝑠2
√ 1 +
𝑛1
𝑛2
𝑆𝑖𝑛𝑐𝑒 |𝑍| < 𝑍𝛼/2 we fail to reject Ho at 5% level of significance and conclude that
there is no sufficient evidence to believe that the two schools performed differently.
8.5.2. Tests about two population proportions
Tests about two population proportions are based on the sampling distribution of the of
𝑃1 βˆ’ 𝑃2 .
1. For 𝐻0 : 𝑃1 βˆ’ 𝑃2 = 0 π‘Žπ‘”π‘Žπ‘–π‘›π‘ π‘‘ 𝐻1 : 𝑃1 βˆ’ 𝑃2 β‰  0, or 𝑃1 βˆ’ 𝑃2 > 0 or 𝑃1 βˆ’ 𝑃2 < 0, the
following test statistic is used.
𝑍=
𝑝̂1 βˆ’ 𝑝̂ 2 βˆ’ 0
1
1
𝑛1 + 𝑛2 )
π‘€β„Žπ‘’π‘Ÿπ‘’ 𝑃 =
βˆšπ‘ƒπ‘„ (
π‘₯1 + π‘₯2
𝑛1 + 𝑛2
2. For 𝐻0 : 𝑃1 βˆ’ 𝑃2 = 𝑃 π‘Žπ‘”π‘Žπ‘–π‘›π‘ π‘‘ 𝐻1 : 𝑃1 βˆ’ 𝑃2 β‰  𝑃, or 𝑃1 βˆ’ 𝑃2 > 𝑃 or 𝑃1 βˆ’ 𝑃2 < 𝑃, the
following test statistic is used.
𝑍=
𝑝̂1 βˆ’ 𝑝̂ 2 βˆ’ 𝑃
√
𝑝̂1 π‘žΜ‚1 𝑝̂2 π‘žΜ‚2
𝑛1 + 𝑛2
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
16
Example 8.5.2
A poll is taken to compare the proportion of town and county voters favoring the proposal
of constructing a chemical plant. Is the proportion of town voters favoring the proposal
higher than the proportion of county voters if 120 of 200 town voters and 240 of 500
county voters favour the proposal? Use 2.5% level of significance.
Working
𝐻0 : 𝑃1 βˆ’ 𝑃2 = 0 (𝑖. 𝑒. 𝑃1 = 𝑃2 )π‘Žπ‘›π‘‘ 𝐻0 : 𝑃1 βˆ’ 𝑃2 > 0(𝑖. 𝑒. 𝑃1 > 𝑃2 )
In this case;
𝑝̂1 βˆ’ 𝑝̂2 βˆ’ 0
𝑍=
βˆšπ‘ƒπ‘„ (
1
1
+
𝑛1 𝑛2 )
π‘Žπ‘›π‘‘ 𝑝̂ 2 =
π‘€β„Žπ‘’π‘Ÿπ‘’ 𝑃 =
π‘₯1 + π‘₯2 120 + 240
120
=
= 0.51, 𝑝̂1 =
= 0.6
𝑛1 + 𝑛2 200 + 500
200
240
= 0.48 οƒž 𝑍 =
500
0.6 βˆ’ 0.48 βˆ’ 0
√(0.51)(0.49) ( 1 + 1 )
200 500
= 2.87
𝑍0.025 = 1.96 οƒž |𝑍| > 𝑍𝛼 and so, we reject Ho.
Rejecting Ho at 2.5% level of significance implies that there is sufficient sample evidence
to support the claim that the proportion of town voters favoring the proposal higher than
the proportion of county voters.
8.5.3. Tests about the difference in two population variances
To test 𝐻0 : 𝜎1 2 = 𝜎2 2 π‘Žπ‘”π‘Žπ‘–π‘›π‘ π‘‘ 𝐻1 : 𝜎1 2 β‰  𝜎2 2 or 𝜎1 2 > 𝜎2 2 or 𝜎1 2 < 𝜎2 2 use 𝐹 =
𝑆1 2
𝑆2 2
and
reject Ho if 𝐹 > 𝑓𝛼 (𝑛1 βˆ’ 1, 𝑛2 βˆ’ 1) for one – tailed test and 𝐹 > 𝑓𝛼/2 (𝑛1 βˆ’ 1, 𝑛2 βˆ’ 1) for a
two- tailed test.
Example 8.5.3
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
17
In the abrasive wear example, we assumed that the two unknown population variances
were equal. Were we justified in making that assumption? Use 0.10 level of significance.
Material 1: 𝑛1 = 12, π‘₯Μ…1 = 85 π‘Žπ‘›π‘‘ 𝑠1 = 4
Material 2: 𝑛2 = 10, π‘₯Μ…2 = 81 π‘Žπ‘›π‘‘ 𝑠2 = 5
𝐻0 : 𝜎1 2 = 𝜎2 2 π‘Žπ‘”π‘Žπ‘–π‘›π‘ π‘‘ 𝐻1 : 𝜎1 2 β‰  𝜎2 2
In this case;
𝐹=
𝑆1 2
𝑆2 2
16
= 25 = 0.64 π‘Žπ‘›π‘‘ 𝑓0.05 (11, 9) =
3.14+3.07
2
= 3.105
Since 𝐹 < 𝑓0.05 (11, 9) we fail to reject Ho and conclude that the assumption of equal
unknown population variances is justified.
Activity
1. The average life of 6 car batteries is 30 months with standard deviation of 4 months.
The manufacturer claims that an average life is 3 years for his batteries and a customer
claims that the manufacturer is exaggerating. If you were in a position of a customer,
would you believe the manufacturer’s claim? Test the claim at 5% level of significance.
2. A test was given to a large group of boys who scored on average 64.5. The same test
was given to a group of 400 boys who scored on average 62.5 with standard deviation
of 12.5. Examine if the difference is significant at 5% level of significance.
3. A poultry farmer is investigating ways of improving the profitability of his operation.
Using a standard diet turkeys grow to a mean mass of 4.5kg at age 4 months. A sample
of 20 turkeys which were given a special enriched diet had an average mass of 4.8kg
after 4 months. The sample standard deviation was 0.5kg. Using 5% level of
significance, test whether the new diet is effectively increasing the mass of the turkeys.
4. Aircrew escape systems are powered by a solid propellant. The burning rate of this
propellant is an important product characteristic. Specifications require that the mean
burning rate must be 50 centimeters per second. We know that the standard deviation
of burning rate is 𝜎 = 2 centimeters per second. The experimenter decides to specify
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
18
a type I error probability of 0.05 and selects a random sample of 25 and obtains a
sample average burning rate of 51.3 centimeters per second. What conclusions should
be drawn?
5. The mean water temperature downstream from a power plant cooling tower discharge
pipe should be no more than 100°F. Past experience has indicated that the standard
deviation of temperature is 2°F. The water temperature is measured on nine randomly
chosen days, and the average temperature is found to be 98°F.
(a) Should the water temperature be judged acceptable with 5% level of significance?
(b) What is the P-value for this test?
6. The means of two large samples of sizes 2000 and 1000 are 68.0 and 67.5
respectively. Can the two samples be regarded as drawn from the same population of
standard deviation of 2.25.
7. To study the effect of a special study programme, 14 students were selected and
paired according to IQ and scholastic performance. One student from each pair was
randomly selected to participate in the special programme, while the other student
participated in the standard programme. Shortly thereafter, the students took the
national exam and obtained the following scores:
Special programme
66 82 96 72 78
82 67
Standard programme
60 79 92 73 75
80 69
Is there any difference in the mean scores under the two programmes? Why?
8. An administrator at a large university stated that there was a difference in the mean
grade point average of graduating males and females. A random sample of 45
graduating males gave a mean grade point average of 2.10 and a variance of 0.64,
while a random sample of 50 graduating females gave a mean grade point average of
2.45 and a variance of 0.70. By constructing a 95% confidence interval or testing the
hypothesis at 5% level of significance, would you conclude that the data support the
administrator’s belief?
9. Two varieties of maize are being tested in a developing country. 12 test plots are given
identical treatment. Six plots are sown in variety 1 and the other six plots in variety 2.
In an experiment in which the crop scientist hope to determine whether there is a
significant difference between the yields using 5% level of significance. The results
were:
Variety 1:
Variety 2:
1.5
1.6
1.9
1.8
1.2
2.0
1.4
1.8
2.3 1.3
2.3
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
19
One of the plots planted with variety 2 was accidentally given extra dossal of fertilizer
so the result was discarded. Would you conclude that there was a significant difference
in the yield between the two varieties of maize? Assume equal population variances.
10. A coin is tossed 256 times and 132 heads are observed. Is there sufficient evidence to
conclude that the coin is biased?
11. Twenty people were affected by cholera and out of them only 18 survived. Would you
reject the hypothesis that the survival rate if affected by cholera is 85% in favour of the
hypothesis that it is more at 5% level of significance?
12. A manufacturing company claims that at least 95% of its products supplied conforms
to the specification. Out of a sample of 200 members, 18 are found to be defective.
Test the claim at 5% level of significance.
13. A certain geneticist is interested in the proportion of males and females in the
population that have a certain minor blood disorder. In a random sample of 1000 males,
250 are found to be afflicted, whereas 275 of 1000 females tested appeared to have a
disorder. Is there sufficient evidence to conclude that the proportion of females with a
minor blood disorder was higher than that of males? Test the hypothesis at 1% level
of significance
14. In a random sample of 1000 persons from city A, 400 are found to be consumers of
wheat. In another sample of 800 persons from city B, 400 are found to be consumers
of wheat. Do these data reveal a significant difference in the proportion of consumers
of wheat between the two cities?
Lecture Notes on Statistical Hypothesis Testing/Compiled by Angel Mukuka (PhD)
20