Download Part VIII - Tests of Significance - Chapters 26, 28, and 29

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Part VIII - Tests of Significance
Chapters 26, 28, and 29
Dr. Joseph Brennan
Math 148, BU
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
1 / 90
Tests of Significance
Confidence Intervals: Intervals on the number line which are used
to estimate the population parameter (µ) from the sample statistic
(x̄).
Tests of Significance: Tests intending to assess the evidence
provided by the data in favor of some claim about a population
parameter (µ).
A significance test is a formal procedure which uses the data to choose
between two competing hypotheses, the null hypothesis and the
alternative hypothesis.
Hypotheses are statements about the population parameter (µ). A
decision rule based on the probability computation is used to choose
between two hypotheses.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
2 / 90
Stating Hypotheses
A hypothesis is a statement about the population parameter (µ) whose
truth is in question.
Example (Air Force recruits) (from McGHEE Introductory Statistics)
Suppose the mean weight of male Air Force recruits is thought to be
around 154 pounds. The following hypotheses can be drawn:
Verbal Statement
The mean weight is
The mean weight is
The mean weight is
The mean weight is
154 pounds
less than 154 pounds
greater than 154 pounds
not equal to 154 pounds
Math
H:
H:
H:
H:
Statement
µ = 154
µ < 154
µ > 154
µ 6= 154.
The tests we will develop require two hypotheses, the null hypothesis and
the alternative hypothesis.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
3 / 90
The Null Hypothesis
Null Hypothesis: The basic or primary statement about the parameter
(µ).
We abbreviate null hypothesis as H0 .
We will consider a special case of a simple null hypothesis which has
a form (H0 : µ = µ0 ), where µ0 is the hypothesized value for µ.
The null hypothesis is usually a sceptical statement of no difference or no
effect. Generally, the null hypothesis represents some well-established
position that should not be rejected unless there is considerable evidence
to the contrary.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
4 / 90
The Null Hypothesis
Note 1: The null hypothesis states that µ = µ0 . For any sample
from the population we do not expect that x̄ = µ0 exactly.
Recall from Part V: The probability of an exact value arising from a
continuous random variable is zero.
Even though x̄ will generally be different from µ0 , it does not mean
that we will always reject H0 . A test of significance attempts to
determine if the difference is real or if it is attributable to
chance error.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
5 / 90
The Null Hypothesis
Note 2: We do not test to accept H0 ; it is assumed to be true.
Rather, we test to see if it should be rejected.
If we reject H0 , there must be some other hypothesis that we are
willing to accept. This is called the ALTERNATIVE HYPOTHESIS
and is abbreviated as Ha .
We establish the alternative hypothesis by (partially) negating the null
hypothesis. Since H0 involves equality, the alternative hypothesis
involves an inequality. Depending on the direction of inequality,
there exist one-sided and two-sided alternative hypotheses :
One-sided alternative hypothesis has the form:
i) (Ha : µ > µ0 ) (right-sided alternative) or
ii) (Ha : µ < µ0 ) (left-sided alternative).
Two-sided alternative hypothesis has the form (Ha : µ 6= µ0 ).
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
6 / 90
The Null hypothesis
The direction of inequality in the alternative hypothesis (Ha ) usually
follows from the question in the problem. If the direction is not specified
in the problem, we should use the two-sided alternative as a default.
It would be an act of CHEATING to first look at the data and then frame
Ha to fit what the data shows.
The textbook calls this type of cheating data snooping.
If you do not have a specific direction firmly in mind in advance, you
should use the two-sided alternative hypothesis. Some statisticians would
argue that we should always use a two-sided alternative.
Note 3: The alternative hypothesis is also known as research
hypothesis. The alternative hypothesis is a statement that we want
to prove.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
7 / 90
Hypothesis Examples
The flat Earth model was common in ancient times, such as in the civilizations
of the Bronze Age or Iron Age. This may be thought of as the null hypothesis,
H0 , at the time.
H0 : World is Flat.
Hellenistic astronomy established the spherical shape of the earth around 300 BC.
Many of the Ancient Greek philosophers assumed that the sun, moon and other
objects in the universe circled around the Earth.
H0 : The Geocentric Model : Earth is the center of the Universe.
Copernicus had an alternative hypothesis, H1 that the world actually circled
around the sun, thus being the center of the universe. Eventually, people got
convinced and accepted it as the null, H0 .
H0 : The Heliocentric Model: Sun is the center of the universe.
Later someone proposed an alternative hypothesis that the sun itself also circled
around the something within the galaxy, thus creating a new null hypothesis. This
is how research works - the null hypothesis is refined through testing; even if it
isn’t correct, H0 is an improvement over its predecessors.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
8 / 90
Hypothesis Examples
Example (Air Force recruits)
The mean weight of male Air Force recruits is thought to be around 154
pounds.
H0 : µ = 154,
Ha : µ 6= 154.
(a) A 1998 study reported that the average weight of newborn kids is 7
pounds. You plan to take a simple random sample of newborns to see
if the average weight has increased.
H0 : µ = 7
Ha : µ > 7
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
9 / 90
Hypothesis Examples
(b) Last year the company’s technicians spent on average 3 hours a day
to respond to from customers. Does this year’s data show a different
average response time?
H0 : µ = 3
Ha : µ 6= 3.
(c) The average square footage of one-bedroom apartments in a new
development is advertised to be 460 square feet. A student group
thinks that the apartments are smaller than advertised. They hire an
engineer to measure a sample of apartments to test their suspicion.
H0 : µ = 460
Ha : µ < 460.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
10 / 90
Hypothesis Examples
(d) Suppose you are playing a game which involves rolling a die and you
have a feeling that 6 is appearing more often that it should! Let X be
the variable that records the number that shows up on rolling the die.
H0 : P(X = 6) = 1/6,
Ha : P(X = 6) > 1/6.
(e) Suppose you are flipping a coin, which otherwise seems fair, and seem
to believe that heads is appearing less often that it should!
H0 : P(H) = 1/2,
Dr. Joseph Brennan (Math 148, BU)
Ha : P(H) < 1/2.
Part VIII - Tests of Significance
11 / 90
Why Bother?
For each experiment or data, the null hypothesis is a general default position
which needs to be substantiated or ruled out. Moreover, this needs to be done on
an experiment-by-experiment basis.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
12 / 90
Decision Rule and Test Statistics
In testing hypotheses we speak of testing the null against the alternative
hypothesis. We either reject or do not reject the null hypothesis based
on the evidence from the data.
If the null hypothesis is rejected, we accept the alternative hypothesis.
The decision to reject H0 or not should be based on an appropriate
decision rule. A decision rule for a test is based upon a test statistic.
Test Statistic: Test statistic is a quantity computed from the data which
measures the compatibility between the null hypothesis and the data.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
13 / 90
Test Statistic
Very often the test statistic has the following form:
Test Statistic =
estimate - hypothesized value
standard deviation of the estimate
If the hypotheses are statements concerning the population mean µ, then
test statistic has the form:
z=
x̄ − µ0
x̄ − µ0
= σx̄
√σ
n
The above statistic is called the z-statistic for µ because it is the z-score
for x̄ (under the null hypothesis). If the population distribution is normal
or sample size is large enough, (n ≥ 30), the distribution of z is standard
normal (z ∼ N(0, 1)). From interpretation of the z-score it follows that:
The z statistic shows by how many standard deviations x̄ is smaller or
greater than µ0 (specified by H0 ).
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
14 / 90
P - value
The test statistic is used to compute the P-value of the test.
P-value: In a test of hypotheses, the P-value is the probability that the
test statistic would take a value as extreme or more extreme (in the same
direction) than of that actually observed. This probability is computed
under the assumption that H0 is true.
Small P-values correspond to extreme values of the test statistic and
should lead to rejection of H0 . The smaller the P-value, the stronger the
evidence against H0 . We decide to reject or not reject H0 by comparing
the P-value with the level of significance α.
Note: Usually statistical studies report the value of the test statistic and
the P-value.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
15 / 90
Significance Level
Significance Level: The significance level, α, is a fixed constant which
denotes the critical P-value which we regard to be decisive. This amounts
to announcing in advance how much evidence against H0 we will require to
reject H0 .
The most frequently used values of α are 0.1, 0.05 or 0.01.
Rules of decision based on the P-value:
We reject H0 at α level if P-value< α. Otherwise, we fail to reject H0 .
If we reject H0 , we say that the result is statistically significant at
α level, which means that the observed difference between the data
and H0 is too large to be attributed to the chance error.
When we reject H0 , we accept Ha .
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
16 / 90
Example
Consider a test with two hypotheses
H0 : µ = 140,
Ha : µ > 140,
where a sample of size 64 has a mean 143 and standard deviation of 10.
The value of x̄ is under the right tail of the distribution as 140 < 143. The
P-value is
P − value = P(x̄ ≥ 143)
= P
Z>
143 − 140
!
√10
64
= P(Z > 2.4) = 0.0082.
Whatever traditional value of α we choose (0.01, 0.05 or 0.1), we will
reject H0 since P − value < α. So we reject H0 and accept the alternative
hypothesis as more plausible.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
17 / 90
Case 1: Left-sided alternative (Ha : µ < µ0 )
P-value computation:
Let z be the computed value of the test statistic. The way in which we
compute the P-value depends on the direction of the alternative
hypothesis. There are three possible cases :
Case 1: Left-sided alternative (Ha : µ < µ0 )
P − value = P(Z ≤ z)
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
18 / 90
Case 2: Right-sided alternative (Ha : µ > µ0 )
Case 2: Right-sided alternative
(Ha : µ > µ0 ).
P − value = P(Z ≥ z)
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
19 / 90
Case 3: Two-sided alternative (Ha : µ 6= µ0 )
Case 3: Two-sided alternative
(Ha : µ 6= µ0 ).
P − value = 1 − P(−z < Z < z).
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
20 / 90
The P - Value Computation
The direction of inequality in the P-value is the same as the
direction of inequality in the alternative hypothesis.
The P-value in the two-sided case is twice as large as the
P-value in the one-sided case.
The explanation is the following: we have an alternative hypothesis
(µ 6= µ0 ). To reject the null hypothesis we should observe either
extreme positive or extreme negative values of the test statistic z.
Suppose that for a given sample we found z = −2.2. Since we are
considering extreme values in both directions in Ha , we argue that the
z-values more extreme than the observed −2.2 are:
Z ≥ 2.2 OR Z ≤ −2.2.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
21 / 90
The P-Value Computation
Z ≥ 2.2 OR Z ≤ −2.2.
The above two inequalities can be combined into one:
Z ≤ −2.2 or Z ≥ 2.2 ⇔ |Z | ≥ 2.2.
Hence, the P-value is computed as
P − value = P(|Z | ≥ 2.2) = 1 − P(−2.2 < Z < 2.2) = 2.78%.
As a consequence, it is easier to reject H0 in favor on a one-sided
alternative because the P-value in the case of two-sided alternative is
twice that of the P-value in the case of a one-sided alternative
hypothesis.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
22 / 90
The P-Value Computation
The Pvalue provides the strength of evidence against H0 . The smaller
the P-value, the stronger the evidence against H0 .
If the P - value is less than 0.05, the result is often called
statistically significant. This is because α = 0.05 is the most
frequently used level of significance.
If the P - value is less than 0.01, the result is called highly
significant. The significance level α = 0.01 is used when we want to
reject H0 only for VERY convincing evidence against it.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
23 / 90
Common Misinterpretation of a P - Value
Many people misunderstand what question a P - value answers.
If the P-value is 0.03, that means that there is a 3% chance of observing a
difference from H0 as extreme as you observed on a subsequent trial.
It is tempting to conclude that there is a 97% chance that the Ha is
correct and a 3% chance that the H0 is correct.
This is an incorrect interpretation!
What you can say is that random sampling from identical populations
would lead to a difference smaller than you observed in 97% of
experiments and larger than you observed in 3% of experiments.
You have to choose. Would you rather believe in a 3% coincidence? Or
that the H0 is incorrect?
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
24 / 90
One Sample z - Test for µ
One Sample z-Test for µ: A test to determine the validity of a
statement concerning the mean µ based upon a single sample.
STEP 1: State the hypotheses.
H0 : µ = µ 0 ,
As a default, the alternative Ha is two-sided. A problem may specify
whether Ha is left-sided or right-sided.
STEP 2: Choose the significance level α.
Assume α = 0.05 unless otherwise stated.
STEP 3: Calculate the test statistic.
z=
Dr. Joseph Brennan (Math 148, BU)
x̄ − µ0
√σ
n
.
Part VIII - Tests of Significance
25 / 90
One Sample z - Test for µ
STEP 4: Compute the P - value. The formula for the P-value
depends on the alternative hypothesis.
Recall that the P-value is the probability that a test statistic would
take a value more extreme than of that actually observed.
STEP 5: Make a decision:
Reject H0 if P − value < α.
Do not reject H0 is P − value > α.
STEP 6: State the conclusion in terms of the alternative
hypothesis.
If you rejected H0 , say ”there is enough evidence at α level that state
your alternative hypothesis in words here”.
If you did not reject H0 , say ”there is not enough evidence at α level
to say that state your alternative hypothesis in words here”.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
26 / 90
Assumptions Associated to the z-Test
The assumptions for the one-sample z-test are the same as assumptions
for calculating the confidence interval for µ in Chapter 21:
Assumption 1. The data results from a simple random sample
from a very large population or observations are obtained by sampling
with replacement from a box (population).
Assumption 2. The population is either normal or the sample size is
large enough (n ≥ 30) for the Central Limit Theorem to apply.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
27 / 90
Example (from Moore and McCabe)
Do middle-aged male executives have different average blood pressure
than the general population?
The National Center for Health Statistic reports that the mean systolic
blood pressure for males 35 to 44 years is 128 and the standard deviation
in this population is 15.
The medical director of a company looks at the medical records of 72
company executives in this age group and finds that the mean systolic
blood pressure in this group is 126.07. Is this enough evidence that
executive blood pressures differ from the national average?
Solution: We will go through the steps outlined in the algorithm for
hypothesis testing.
Step 1. (H0 : µ = 128)
(Ha : µ 6= 128)
In words, H0 says that executives are not different from other men,
whereas Ha says that they are different.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
28 / 90
Example (Executive Blood Pressure)
Step 2. Choose α = 0.05.
Step 3.
z=
x̄ − µ0
√σ
n
=
126.07 − 128
√15
72
≈ −1.09.
Step 4.
P − value = P(Z ≤ −1.09) + P(Z ≥ 1.09)
= 1 − P(−1.09 < Z < 1.09)
= 100% − (86.21% − 13.79%) = 27.58%.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
29 / 90
Example (Executive Blood Pressure)
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
30 / 90
Example (Executive Blood Pressure)
Step 5. Since P − value = 0.2758 > α = 0.05, the null hypothesis is not
rejected.
In fact, more than 27% (about 1 time out of 4) of times a SRS of size 72
from the general male population would produce a mean blood pressure at
least as far from 128 as that of the executive sample.
Step 6. There is not enough evidence at α = 0.05 level that the blood
pressure of middle-aged executives differ from other men.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
31 / 90
Example (Sleeping Habits)
In average, how many hours do people sleep at night? One hundred
Wal-Mart shoppers were asked this question. The sample mean was found
to be x̄ = 7.5 hours. Assume σ = 1.5 hours. Is the result significantly
different from 8 hours?
Solution: We can perform the one-sample z-test, but the conclusion will
not be valid.
The survey was not a random sample but a convenience sample; which
consists of people who are readily available and convenient (Wal-Mart
shoppers).
Wal-Mart shoppers are specific in some sense and do not constitute a
representative sample of all people. So, generalizations can not be made
to all the people by studying just a convenience sample of 100 Wal-Mart
shoppers.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
32 / 90
Example (from Biostatistics by Triola & Triola)
A researcher is convinced that on average humans are colder than
reported. A simple random sample of 106 body temperatures was taken
and with a mean of 98.20◦ F. Assume that the population standard
deviation σ is known to be 0.62◦ F. Use a 0.05 significance level to test the
common belief that the mean body temperature of healthy adults is equal
to 98.6◦ F.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
33 / 90
Example (Healthy Body Temperature)
Solution: We write out the following steps :
Step 1. (H0 : µ = 98.6◦ F )
(Ha : µ < 98.6◦ F ).
Step 2. α = 0.05.
Step 3. The z-score is
z=
x̄ − µ0
√σ
n
=
98.2 − 98.6
0.62
√
106
= −6.642.
Step 4. The P-value is P(Z ≤ −6.642), which is way smaller than
0.05.
Step 5. We reject H0 .
Step 6. There is enough evidence at level α = 0.05 such that
µ < 98.6◦ F is more acceptable.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
34 / 90
Example(from Biostatistics by Triola & Triola)
The health of the bear population in Yellowstone National Park is
monitored by periodic measurements taken from anesthetized bears. A
sample of 5 bears has a mean weight of 182.9 lb. Assuming that the
standard deviation σ is known to be 121.8 lb, use a 0.1 significance level to
test the claim that the population mean of all such bear weights is 200 lb.
Solution:
Step 1. H0 : µ = 200 lb
Ha : µ 6= 200 lb.
Step 2 and 3 α = 0.1 and the z-score is
z=
182.9 − 200
121.8
√
5
= −0.31.
Step 4.
P-value: 1 − P(−0.31 ≤ Z ≤ 0.31) = 0.2434 > 0.1
Step 5 and 6: We do not reject H0 . There is not enough evidence at
level α = 0.1 that µ 6= 200 lb.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
35 / 90
WARNING
In the previous example, we assumed, very conveniently, that the
distribution of the bear weights is normal. Even under this assumption, the
sample size of n = 5 is way too small to use the normal table; or
calculations do not apply!
What do we do in such a scenario? The answer lies in the Student’s
t-distribution.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
36 / 90
Student’s t-Distribution
Student’s t-Distribution: (or simply the t-distribution) is a family of
continuous probability distributions that arises when estimating the
mean of a normally distributed population in situations where the sample
size is small and population standard deviation is unknown.
Like the normal distribution, the t-distribution is symmetric and
bell-shaped. The t-distribution has heavier tails, meaning that it is more
prone to producing values that fall far from its mean.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
37 / 90
Why ”Student”?
History: A derivation of the t-distribution was first published in 1908 by
William Sealy Gosset while he worked at the Guinness Brewery in Dublin.
One version of the origin of the pseudonym Student is that Gosset’s
employer forbade members of its staff from publishing scientific papers, so
he had to hide his identity.
Another version is that Guinness did not want their competition to know
that they were using the t-test to test the quality of raw material.
The t-test and the associated theory became well-known through the work
of the famous statistician R.A. Fisher, who called the distribution
Student’s distribution.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
38 / 90
t-Distributions
There is not a single t-distribution. The t-distributions are indexed by
Degrees of Freedom, a term related to the sample size the t-distribution
represents.
For a sample of size n, use a t-distribution with n − 1 degrees of freedom.
We only need t-distributions for sample sizes less than 30.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
39 / 90
The Family of t-Distributions
There is the whole family of the t - distributions indexed by the number of
degrees of freedom.
The probability densities of all the members of the family of t - distributions are
symmetric about 0, bell-shaped, but have more probability on the tails than does
the standard normal distribution. For this reason a t - distribution is called
heavy-tailed.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
40 / 90
t-Distributions
We denote by t n−1 the t-distribution with n − 1 degrees of freedom.
Consider a random variable X that is normally distributed (or at least
symmetric) of which n ≤ 30 samples are taken, the probability distribution
for the sum approximates t n−1 . That is, for the mean µ and standard
deviation σ of X , the
X̄ − µ
≈ t n−1 .
σ
√
n−1
Similar to confidence intervals with z-scores, a confidence level of C % has
a t-score tCn−1 in the C /2 + 50th percentile.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
41 / 90
Confidence Intervals Chart
To find a confidence interval for the population mean µ with confidence
level C from a sample of size n with mean x̄ and sample standard
deviation s:
(1) If the population standard deviation σ is known, and either
population distribution is normal, or sample size is large (n ≥ 30)
zC × σ
zC × σ
x̄ − √ , x̄ + √
n
n
(2) If the population standard deviation σ is unknown then
Case 1: (n < 30 and population distribution is normal)
#
"
tCn−1 × s
tCn−1 × s
x̄ − √
, x̄ + √
n−1
n−1
Case 2: (n ≥ 30 and population distribution is normal)
zC × s
zC × s
x̄ − √ , x̄ + √
n
n
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
42 / 90
Example (Vitamin C)
The amount of vitamin C in mg/100g in a certain produce is measured in
a random sample of size 10:
26 31 23 21 10 25 33 12 16 30
Compute the 95% confidence interval for µ, the mean vitamin C content.
Solution: The sample mean is x̄ = 22.7 and the sample standard
deviation is s = 7.53.
Assuming that the distribution of vitamin C content in the produce is
normal, we will use the t-confidence interval since σ is unknown and
n = 10 < 30. The t-distribution in this case has 9 degrees of freedom.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
43 / 90
Example (Vitamin C)
Since we are looking for a confidence level of 95%, we must find the
t-score of the 95/2 + 50 = 97.5th percentile. On our t-table, that is under
df = 9 and t0.025
9
t0.025
= 2.262
The 95% CI for µ is
tCn−1 × s
tCn−1 × s
[x̄ − √
, x̄ + √
]
n−1
n−1
2.262 × 7.53
2.262 × 7.53
√
√
= 22.7 −
, 22.7 +
= [17.03, 28.38]
9
9
We are 95% confident that the mean vitamin content in the produce is
between 17.03 and 28.38.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
44 / 90
Example (Textbook Expenditures)
A random sample of semester textbook expenditures by 81 fulltime
university students had a mean of $100 and standard deviation of $30.
Find a 99% confidence interval for the mean expenditures for textbooks by
students at this university.
Solution: We have x̄ = 100 and s = 30. The population standard
deviation σ is unknown, and n = 81 > 30.
The approximate 99% confidence interval µ for is found using a z-table
(as n > 30). We must find the z-score, zC , for the 99/2 + 50 = 99.5th
percentile. zC = 2.576.
30
30
= [91.41, 108.59]
100 − 2.576 × , 100 + 2.576 ×
9
9
Interpretation: We are 99% confident that the true mean expenditures
for textbooks by students in the university is between $91.41 and $108.59.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
45 / 90
One Sample t-Test for the Population Mean µ
A test of significance for the population mean µ with sample size n ≤ 30
will use t-scores rather than z-scores.
Assume the null hypothesis states (H0 : µ = µ0 ).
The test statistic is calculated as
tx̄ =
x̄ − µ0
√s
n−1
The t - statistic is a basis for the t - test for µ which has analogous steps
to the z - test with only two differences:
1
The t-test uses t - statistic on Step 3.
2
In the t - test the P-value on Step 4 is computed as the
corresponding area under the t - curve with n − 1 degrees of freedom.
The t - tests are usually used in the case when σ is unknown, the
distribution of X is roughly normal, and the sample is small (n < 30).
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
46 / 90
Example (Piano Lessons for Preschoolers)
Do piano lessons improve the spatial-temporal reasoning of preschool
children? A study designed to test this hypothesis measured the
spatial-temporal reasoning of 20 preschool children before and after 6
months of piano lessons. The changes in reasoning scores are shown below
2
-2
5
9
7
6
-2
0
2
3
7
6
4
-1
1
3
0
-4
7
-6
Solution: Summary statistics : x̄ = 2.35, s = 3.98. The data’s histogram
is shown on the next slide.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
47 / 90
Example (Piano Lessons for Preschoolers)
The distribution is not normal-like, but it is not extremely skewed.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
48 / 90
Example (Piano Lessons for Preschoolers)
Sample size n = 20 < 30 and σ is unknown, so we will use the t-test. The
hypotheses are
(H0 : µ = 0)
(Ha : µ > 0)
The t-statistic is
t=
x̄ − µ0
√s
n−1
=
2.35 − 0
3.98
√
19
= 2.57,
with degrees of freedom n − 1 = 19. From the t-table:
P-value < 0.01.
The result is highly significant. We reject the null hypothesis and conclude
that piano lessons improve spatial-temporal reasoning of preschoolers.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
49 / 90
Example (Bear Weights)
A sample of 5 bears has a mean weight of 182.9 lb. Assuming that the
standard deviation σ is known to be 121.8 lb, use a 0.1 significance level to
test the claim that the population mean of all such bear weights is 200 lb.
Solution: We shall use the t-test as n = 5 < 30 and assume the weight
distribution is normal.
Step 1. (H0 : µ = 200 lb) (Ha : µ 6= 200 lb)
Step 2 and 3. α = 0.1 and the t-score is
t=
182.9 − 200
121.8
√
4
= −0.28
Step 4. The P-value is
2P(t4 ≥ | − 0.28|) > 2 · 0.25 = 0.5 > 0.1.
Step 5 and 6. We do not reject H0 . There is not enough evidence at
level α = 0.1 that µ 6= 200 lb.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
50 / 90
Chart for Tests of Significance for µ
We have two types of test statistic for the null hypothesis (H0 : µ = µ0 ).
(1) If the sample size is large, (n > 30), then
test statistic =
x̄ − µ0
√s
n
and use the normal table to calculate a P-value.
(2) If the sample size is small, n ≤ 30, and the population distribution is
roughly normal
x̄ − µ0
test statistic =
s
√
n
and use the t-table with n − 1 degree of freedom to calculate a P-value.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
51 / 90
Example (from McGHEE Introductory Statistics).
Rats that are raised in laboratory environment have a mean life span of
around 24 months. A sample of 31 rats reared to adulthood in a germ-free
environment had life spans with a mean of 27.3 and a standard deviation
of 5.9 months. Does this type of rearing have an effect on the life span of
the laboratory rat?
Solution: We are given
x̄ = 27.3 s = 5.9
Step 1. (H0 : µ = 24)
n = 31
(Ha : µ > 24).
Note: We set the alternative as one-sided since we would expect the
lifetime to increase in the germ-free environment.
Step 2. Choose α = 0.05.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
52 / 90
Example (Rat Lifespan)
Step 3. Calculate the test statistic:
z=
x̄ − µ0
√s
n
=
27.3 − 24
5.9
√
31
≈ 3.1
Step 4. Calculate the P-value:
P − value = P(Z ≥ 3.10) = 0.0001.
Step 5. The result is highly significant, so we reject H0 .
Step 6. There is enough statistical evidence at α = 0.05 level that the
average lifetime of rats living in a germ-free environment is greater than
24 months.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
53 / 90
Two-Sided Significance Tests and Confidence Intervals
Confidence intervals for µ and two-sided significance tests for µ are
related. We can decide whether we should or should not reject H0 from
the computed two-sided CI for µ.
Relationship between the two-sided test of significance and
the confidence interval for µ.
A level α two-sided significance test rejects a hypothesis (H0 : µ = µ0 )
exactly when the value µ0 falls outside a (1 − α)% confidence interval for
µ.
The significance level α of the two-sided test is related to the
confidence level C of the confidence interval through the rule
C = 1 − α.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
54 / 90
Two-Sided Significance Tests and Confidence Intervals
H0 : µ = µ0 , Ha : µ 6= µ0
α=1−C
Case 1 : µ0 is inside of the C% CI
µ0 x̄
Decision : Failed to reject H0 at level α
Case 1 : µ0 is outside of the C% CI
x̄
µ0
Decision : Reject H0 at level α
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
55 / 90
Example (Chicken)
A company that manufactures chicken feed has developed a new product.
The company claims that 12 weeks after hatching, the average weight of
chickens using this product will be 3.0 pounds. The owner of a large
chicken farm decided to examine this new product, so he fed the new
ration to all 12,000 of his newly hatched chickens.
At the end of 12 weeks he selected a simple random sample of 20 chickens
and weighed them. The sample mean for the 20 chickens is 3.06 pounds
and the sample standard deviation was s = 0.63 pounds.
(a) Find a 95% confidence interval for the mean µ of the 12,000 chickens.
(b) Perform a test of significance to check a company’s claim (α = 0.05).
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
56 / 90
Example (Chicken)
Solution: We have 19 degrees of freedom and:
s = 0.63 n = 20 x̄ = 3.06 µ0 = 3.
19
95% Confidence Interval: x̄ ± t0.025
×√
s
n−1
0.63
0.63
3.06 − 2.093 × √ , 3.06 + 2.093 × √
= [2.76, 3.36]
19
19
The hypotheses:
H0 : µ = 3,
Ha : µ 6= 3.
The test statistic:
t=
x̄ − µ0
√s
n−1
=
3.06 − 3
0.63
√
19
= 0.415.
P − value = 2P(t 19 ≥ 0.415) > 2(0.10) ≥ 0.2
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
57 / 90
Example (Chicken)
We fail to reject H0 since P − value > α.
Conclusion: We do not have enough statistical evidence at α = 0.05 level
to claim that the mean chicken weight is different from 3 pounds. So the
data supports the company’s claim.
NOTE: The computed 95% confidence interval contains µ0 = 3 (pounds)
and the test of significance which uses α = 0.05 = 1 − 0.95 level does not
reject H0 .
TRUE: If µ0 belongs to the 1 − α confidence interval, then the α-level
two-sided test fails to reject H0 .
TRUE: If the 1 − α confidence interval does not contain µ0 , then the
α-level two-sided test rejects H0 .
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
58 / 90
Two Types of Error in Tests of Significance
We either reject H0 , or fail to reject H0 based on the data. We hope that
our decision is correct, but sometimes it will be wrong! There are two
types of incorrect decisions:
TYPE I and TYPE II ERRORS
If we reject H0 when in fact H0 is true, this is TYPE I error.
If we do not reject H0 when in fact Ha is true, this is TYPE II error.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
59 / 90
Probabilities of Type I and Type II Errors
Type I error can be thought of as convicting an innocent person.
Type II error can be thought of as letting a guilty person go free.
Significance and Type I error.
From definition of the significance level α it follows that the probability
of Type I error is equal to α. This explains why we want to choose α to
be small.
Power of the test and Type II error.
The power of the test is the probability that the test rejects H0 when Ha
is true. High power is desirable. The probability of the Type II error is
1 minus the power of the test.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
60 / 90
Practical Significance vs. Statistical Significance
Refer to Section 3 of Chapter 29 of the textbook.
Sometimes the difference is statistically significant but practically
unimportant. The following example illustrates the point.
Example (Bulb)
An engineer has designed an improved light bulb. The previous design had
an average lifetime of 1200 hours. Based on a sample of n = 2500 of the
new bulbs, the average lifetime was found to be x̄ = 1201. Take σ = 10
(hours). Does a new bulb have greater lifetime?
Solution The hypotheses:
H0 : µ = 1200,
Dr. Joseph Brennan (Math 148, BU)
Ha : µ > 1200.
Part VIII - Tests of Significance
61 / 90
Example (Bulb)
Test statistic:
z=
x̄ − µ0
√σ
n
=
1201 − 1200
√ 10
2500
= 5.
The P - value:
P − value = P(Z ≥ 5) ≈ 0.
We reject H0 and conclude that we have enough evidence that the new
bulb is better. But how is it better? Is the lifetime increase of 1 hour for a
light bulb really important?
REMARK: Statistical significance is easier to show with larger sample
sizes n. Even a tiny difference between the true mean µ and the
hypothesized mean µ0 will be evident if we choose large enough sample.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
62 / 90
Example (Types of Error)
A medical researcher is working on a new treatment for a certain type of
cancer. The average survival time after diagnosis on the standard
treatment is 2 years. In an early trial, she tries the new treatment on three
subjects who have an average survival time after the diagnosis of 4 years.
Although the survival time has doubled, the results are not statistically
significant even at the 0.10 significance level. Suppose, in fact, that the
new treatment does increase the mean survival time in the population of
all patients with this particular type of cancer. What type of error, if any,
has been committed?
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
63 / 90
Example (Types of Error)
Solution : The hypotheses are
H0 : µ = 2,
Ha : µ > 2.
The results are not statistically significant means that we fail to reject
H0 . But we know that the new treatment does increase the mean survival
time which means that, in fact, Ha is true. So we failed to reject H0 when
in fact Ha is true. This is Type II error.
Comment: Having just 3 patients was not enough to prove a significance
of the result.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
64 / 90
Concluding Remarks about Significance Tests
(i) We only discussed the one-sample significance tests for µ. Many
other significance tests exist. In fact, a test of significance can be
constructed for any population parameter or their difference. (see
Chapter 27).
(ii) Different tests have different technical details (such as different
hypotheses, test statistics and rules for P-value computation), but all
the significance tests use the same steps and definitions, and have a
similar interpretation.
(iii) A chance model is required for a test of significance; a box model is a
type of chance model.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
65 / 90
Concluding Remarks about Significance Tests
(iv) A test of significance only determines if a difference is real or due to
chance variation. It does not rank how important the difference is,
explain what causes it, or check the validity of the study used to
accumulate data.
(v) The z-test and t-test are tests which compare the mean of a sample
to the mean established by an external standard.
(vi) The χ2 -test, our next topic, compares observed and expected
frequencies.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
66 / 90
The Chi-Square Test
Often, we must ask the basic and necessary question :
How well does the model fit the facts?
In many cases, the answer is given by the χ2 -test.
χ is a Greek letter.
It is often written as chi.
It is pronounced as ki as in kite.
The χ2 -test compares observed and expected frequencies in determining if
a model is appropriate.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
67 / 90
The Chi-Square Test: An Example
A gambler is accused of using a loaded die, but he pleads innocent. A
record has been kept of the last 60 throws.
4
2
3
6
6
5
3
4
3
4
3
4
3
1
4
4
6
6
1
3
5
2
2
3
2
3
4
3
4
3
3
5
5
3
6
3
4
3
6
2
4
5
6
4
4
4
6
3
5
3
5
4
3
1
6
4
1
5
2
4
There is some disagreement about how to interpret the data and a
statistician is called in. What is the verdict?
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
68 / 90
The Chi-Square Test: An Example
Solution: If the gambler is innocent, the numbers in the given table
should be a result of drawing randomly from the a box containing numbers
1 through 6. Therefore, each of the six numbers should appear in the
table approximately 10 times: the expected frequency is 10. To compare
this observation with what we have, we have to calculate the frequency
distribution:
Value
1
2
3
4
5
6
Sum
Observed frequency
4
6
17
16
8
9
60
Expected frequency
10
10
10
10
10
10
60
As we can observe, the table has too many 3’s and 4’s.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
69 / 90
The Chi-Square Test: An Example
The standard error for the number of 3’s is
p
p
n · p(1 − p) = 60 · (1/6) · (5/6) ' 2.9.
Therefore, the observed number of 3’s is (17 − 10)/2.9 ' 2.4 SE’s above
the expected number.
However, we shouldn’t take the table one line at a time! For example,
there are too many 4’s. But with many lines in the table, there is a high
probability that at least one of them will look suspicious - even if the die is
fair! We need something more substantial to detect the fairness of the die.
The value χ2 is defined
(observed frequency − expected frequency)2
.
χ = sum of
expected frequency
2
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
70 / 90
The Chi-Square Test: An Example
The formula is not arbitrarily derived, as we shall see later. For now,
χ2 =
(4 − 10)2 + (6 − 10)2 + · · · + (8 − 10)2 + (9 − 10)2
= 14.2.
10
When the observed frequency is far from the expected frequency, the
corresponding term in the sum is large; when the two are close, this
term is small.
Question: What is the chance that when a fair die is rolled 60 times
and χ2 is computed from the observed frequencies, its value turns out to
be 14.2 or more?
Note that larger values of χ2 would be even stronger evidence against the
model.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
71 / 90
Karl Pearson
Calculating this chance is a tremendous undertaking! Back in the days of
Karl Pearson (1900’s) there were no computers.
He came up with a distribution to compute this probability by hand! It
involved a new curve, called the χ2 -curve. There is one curve for each
number of degrees of freedom, analogous to the t-distribution. Moreover,
if everything is specified, then
degrees of freedom = number of terms in χ2 − one.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
72 / 90
The χ2 -Curve
How does a χ2 -curve look?
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
73 / 90
Properties of the χ2 -Distribution
1. The χ2 is not symmetric unlike the Student t or the normal
distribution.
2. The values of χ2 can never be negative.
3. The χ2 -distribution is different for each number of degrees of
freedom, which is given by
df = n − 1,
where n is the number of categories.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
74 / 90
The χ2 -Table
This is only part of the full table; we have only highlighted the relevant
part that we will be making use of.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
75 / 90
The Chi-Square Test: An Example
In our case, we need χ2 with 5 degrees of freedom. It follows from the
table that the probability of 14.2 or more is slightly more than 1%. If we
are using modern day computational power, then we can get the answer is
an instant :1.4%.
In any event, the statistician’s work here is done! There is a strong
evidence that the guy pleading innocence is actually a gambler and a fraud!
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
76 / 90
The χ2 -Test
When testing a hypothesis on a trial with multiple categories (tickets), use
the χ2 -test. The steps of the test are outlined below:
(i) Create the chance model (box model).
(ii) Create a frequency table consisting of observed frequency and
expected frequency for each category (ticket).
(iii) Compute the χ2 -statistic.
(iv) Compute the degrees of freedom; the number of categories−1.
(v) Obtain a P-value from the χ2 -table and consider rejecting H0 .
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
77 / 90
Example (Grand Juries)
A study of grand juries formed in Alameda County, California was
investigating if the age of jurors chosen is representative of the age of the
population. The size of a grand jury varies, but a total of 66 were sampled
(representing 6 juries).
Age
21 to 40
41 to 50
51 to 60
61 and over
Total:
County Percentage
42
23
16
19
100
Number of Jurors
15
14
19
18
66
Does the age composition of juries represent the county?
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
78 / 90
Example (Grand Juries)
Solution: We use the χ2 -test with a null hypothesis assuming juries
represent the age composition of the county. Since α was not specified, let
α = 0.05.
(i) The box model:
(ii) The frequency table for 66 jurors:
Age
Expected
21 to 40
27.7
41 to 50
15.2
51 to 60
10.6
12.5
61 and over
Total:
66
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
Observed
15
14
19
18
66
79 / 90
Example (Grand Juries)
(iii) The χ2 -statistic
χ2 =
(15 − 27.7)2 (14 − 15.2)2 (19 − 10.6)2 (18 − 12.5)2
+
+
+
≈ 15
27.7
15.2
10.6
12.5
(iv) There are 4 categories so 3 degrees of freedom.
(v) From the χ2 -table, the P-value is less than 0.5% and approaches 0%.
We reject the null hypothesis as our significance level was assumed to be
5%. We have a statistically significant sample pointing towards bias in
choosing older jurors.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
80 / 90
Uses of χ2 -Curve
There are several other uses of χ2 -curve:
χ2 -statistic can be used to test independence.
χ2 -statistic can be used with any number of categories!
χ2 -statistic can be used to test a claim about standard deviation .
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
81 / 90
Independent Experiments
If experiments are performed independently, the results can be pooled with
separate χ2 -statistics and degrees of freedom.
Example: Assume experiment A is performed independently of experiment
B.
Assume A has χ2 = 5.8 with 5 degrees of freedom and B has χ2 = 3.1
with 2 degrees of freedom.
The combined experiment A + B has χ2 = 5.8 + 3.1 = 8.9 and 5 + 2 = 7
degrees of freedom.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
82 / 90
Independence Testing
The χ2 -test is able to test for independence. This will be highlighted
through examples.
The HANES study of 2, 237 Americans between the ages of 25 and 34
recorded the gender and dominant hand of subjects.
Right-Handed
Left-Handed
Ambidextrous
Total:
Men
934
113
20
1,067
Women
1,070
92
8
1,170
Total:
2,004
205
28
2,237
Assume that subjects were chosen in a simple random sample. From this
sample, is dominant hand independent from gender?
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
83 / 90
Independence Testing
We have a null hypothesis:
H0 : Dominant Hand and Gender are Independent
We have an alternative hypothesis:
HA : Dominant Hand and Gender are Dependent
We do not know the population parameters with respect to dominant hand
and gender, only the information given by the sample. We have a large
sample, so we will assume the population matches the sample.
Right-Handed
Left-Handed
Ambidextrous
Dr. Joseph Brennan (Math 148, BU)
Men
87.5%
10.6%
1.9%
Women
91.5%
7.9%
0.7%
Part VIII - Tests of Significance
84 / 90
Independence Testing
Using H0 , the hypothesis that hand dominance and gender are
independent, we are able to construct a table for observed and expected
frequencies:
Observed Men
934
113
20
Observed Women
1,070
92
8
Expected Men
956
98
13
Expected Wome
1,048
107
15
How many degrees of freedom are there? When testing for
independence in an m × n table, there are (m − 1) × (n − 1) degrees of
freedom.
In this example we start with a 3 × 2 table; 3 rows and 2 columns.
Therefore, we have (3 − 1) · (2 − 1) = 2 degrees of freedom.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
85 / 90
Independence Testing
We now find the χ2 -statistic:
χ2 =
=
X (observed - expected)2
expected
(934 − 956)2 (1, 070 − 1, 048)2 (113 − 98)2 (92 − 107)2
+
+
+
956
1, 048
98
107
+
(20 − 13)2 (8 − 15)2
+
13
15
χ2 ≈ 12
We have yet to set a confidence level, though the P-value for a
χ2 -statistic of 12 with 2 degrees of freedom is less than 0.5%.
The sample provides strong statistical evidence against the null hypothesis.
Gender and dominant hand appears to be independent.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
86 / 90
Z -Test or χ2 -Test?
When should the χ2 -test be used, as opposed to the z-test?
The z-test says whether the data are like the result of drawing at
random from a box whose average is given.
The χ2 -test says whether the data are like the result of drawing at
random from a box whose contents are given.
The z-test deals with averages.
The χ2 -test deals with frequencies from all categories; this test is more
comprehensive and deals with the balance expected from the model.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
87 / 90
Example (from Statistics by Samuels et. al.)
A cross between white and yellow summer squash gave progeny of the
following colors:
COLOR
No. of progeny
WHITE
155
YELLOW
40
GREEN
10
Question Are these date consistent with the 12 : 3 : 1 ratio predicted by a
certain genetic model? (Use a χ2 -test with α = 0.10.)
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
88 / 90
Example (Squash)
There are three categories involved here. According to given data
White
Yellow
Green
Observed Frequency
155
40
10
Expected Frequency
(12/16) · 205 = 153.75
(3/16) · 205 = 38.44
(1/16) · 205 = 12.81
The χ2 -statistic is
(expected frequency − observed frequency)2
expected frequency
2
(155 − 153.75)
(38.44 − 40)2 (12.81 − 10)2
=
+
+
153.75
38.44
12.81
= 0.689.
χ2 = sum of
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
89 / 90
Example (Squash)
Recall that there is the null hypothesis (H0 : no change in ratio) versus
(Ha : significant change in ratio).
We want to compare the area under χ2 with 2 = 3 − 1 degrees of freedom
with α = 0.10.
The probability is given by
P(χ22 > 0.689) = P(χ22 > 0.689) is bigger than 0.10.
Therefore, we cannot reject the null hypothesis based on the data at
α = 0.10.
Dr. Joseph Brennan (Math 148, BU)
Part VIII - Tests of Significance
90 / 90