Download Chapter 7 Estimation and testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 7
Estimation and testing
The researcher can never be certain that his observations are uncontaminated by
error. No matter how carefully one may be in planning and conducting a study,
a multitude of influences, unintended and unwanted by researcher, produces his
data effects of unknown magnitude. These unintended differences or biases make
interpretation of of the results of research quit difficult and introduce into the
interpretation a degree of uncertainty which cannot be eliminated. The answer
should be stated in probabilistic term.
To estimate the value of a population parameter, one can use information from
the sample in the form of an estimator. Estimators are calculated using information
from the sample observation.
An estimator is rule, usually expressed as a formula, that tells us how to
calculate an estimate based on the information in the sample.
Estimators are used in two different ways:
Point estimator: Based on sample data, a single number is calculated to
estimate the population parameter.
Interval estimator: Based on sample data, two numbers are calculated to
form an interval within which the parameter is expected to lie with a certain
probability.
The aim of statistical inference is to make certain determinations with regard
to the unknown parameters figuring in the underlying distribution. This is to be
done on the basis of data, represented by the observed values of a random sample
drawn from said distribution.
7.1
Sampling distributions
What exactly does the sample, an often tiny subset, tell us about the population?
We can never observe the whole population, even if finite, except at enormous
expense so the population mean and variance and indeed any aspect of the population distribution can never be known exactly. We call these unknown population
quantities parameters and use Greek letters to denote them: µ(‘mu’) is the symbol commonly used for the population mean and σ(‘sigma’) for the population
1
2
CHAPTER 7. ESTIMATION AND TESTING
standard deviation. As we have a sample of n observations we need to ask the
question: Is x̄ a ‘good’ estimate of µ? ? We know that for almost all samples
x̄ is not equal to µ but how close is it? Can we answer this question without
knowing what µ is? Does x̄ get closer to µ as n increases? We need to study
the properties of the sample mean as an estimator of the population mean and
we achieve this by looking at the values x̄ can take over all possible samples: the
sampling distribution. Of course we can never examine all possible samples but
the easy availability of a statistical package like Splus enables us to study sampling
properties much more readily. We can actually illustrate the theoretical results of
this chapter by conducting a simulation experiment.
A parameter is a numerical characteristic of the population of interest. Parameters are usually unknown and we make inferences on them using the sample
data.
(Examples: p, the probability of ‘success’ is a parameter of a Binomial population distribution. The rate of ‘failure’ λ is a parameter of a Poisson distribution
and also of an exponential distribution of ‘lifetimes’ i.e. time to ‘failure’ in a Poisson
process where ‘failures’ occur randomly in time.)
The population mean µ (‘mu’) is a common parameter. If the population
is modelled or described by a p.d.f. (probability density function) fX (x) for a
continuous variable X then
Z
µ = E[X] = xfX (x)dx
If however X is a discrete random variable with probability function (or p.d.f.)
pX (x) = P [X = x] then
X
µ = E[X] =
xpX (x)
Other parameters measuring location can be defined in terms of the c.d.f. FX (x) =
P [X ≤ x]: for example the population median M and the upper and lower quartiles
Q3 and Q1 respectively.
The population variance σ 2 (‘sigma-squared’) is a common parameter measuring variability:
2
2
2
2
σ = V ar[X] = E[(X − µ) ] = E[X ] − µ =
Z
x2 fX (x)dx − µ2
(for X continuous, and similarly in the discrete case.)
An estimate is a statistic that we hope will be ‘near’ to the parameter of
interest. (For example, x̄ is an estimate of µ.)
An estimator is a rule for calculating an estimate from any sample, usually a
random sample.
A random variable (r.v.) is a statistic whose value is determined once the
sample data have been observed. Thus an estimator is a r.v. but, in general, a r.v.
need not estimate anything.
CHAPTER 7. ESTIMATION AND TESTING
3
Use upper case letters for r.v.s and the corresponding lower case letter for the
values taken by the r.v.s. If Xi is the r.v. denoting the measurement of variable X
on unit i in the sample (i = 1, 2, . . . , n) then
n
1X
Xi
X̄ =
n i=1
and
n
1 X
(Xi − X̄)2
S =
n − 1 i=1
2
are the sample mean and variance respectively, considered as r.v.s.
The sampling distribution of a r.v. is the collection or distribution of all
possible values of the r.v. over all possible samples. The properties of the sampling
distributions of these two r.v.s determine how we make inferences on the unknown
µ (and σ 2 ) from any sample.
If the sample is a random sample of size n from an infinite population then
X1 , X2 , . . . , Xn are independent r.v.s each with the same distribution (i.e. same
p.d.f. or probability function) as the population so that
E[Xi ] = µ and V ar[Xi ] = σ 2
(i = 1, 2, . . . , n)
The main result of this section is the following theorem:
Theorem 7.1
Averaging over all random samples of size n from an arbitrary population with
mean µ and variance σ 2 , the sample mean X̄ and sample variance S 2 have the
following three properties:
E[X̄] = µ i.e. X̄ is an unbiased estimator of µ
V ar[X̄] =
σ2
n
E[S 2 ] = σ 2
i.e. the variability of X̄ as an estimator decreases with n
i.e. S 2 is an unbiased estimator of σ 2 .
Thus S 2 /n is used as an unbiased estimator of the variability or variance of
X̄ as an estimator of µ. It is vital in statistics to have such an estimate so that
inferences using probability can be made.
CHAPTER 7. ESTIMATION AND TESTING
4
√
The theorem shows that σ/ n is the standard√deviation of the sampling distribution of X̄ as an estimator of µ. That is, σ/ n measures the variability of
possible estimates
√ x̄ about the ‘true’ population mean µ. A sample estimate of this
variability is s/ n called the (estimated) standard error of the (sample) mean.
As the sample size increases, but with the sample still random, the variability
or uncertainty in our estimate of µ decreases monotonically with a limit of zero,
i.e. knowledge without uncertainty as n → ∞.
We can show using some rather difficult results in probability theory that X̄ →
µ as n → ∞ with probability 1, with the intuitive interpretation that we are certain
to arrive at the ‘true’ value as the sample size increases indefinitely. This is the
Strong Law of Large Numbers and is not discussed further here. Theorem 5
allows us to prove (proof not required in this course) only that
P [|X̄ − µ| < ǫ] → 1 as n → ∞.
This means that the sequence of real numbers giving the probability that X̄ is
within ǫ of µ has a limit of unity. This is called the Weak Law of Large Numbers.
Another important result for statistical inference for large sample sizes is the
Central Limit Theorem, which says that as n → ∞ the sampling distribution
of X̄ tends to a Normal distribution with the same mean and variance. We can
demonstrate this empirically using R. The importance of this result is that we
do not need to know the form or type of the original population distribution if
our sample size is sufficiently large. We can use instead the Normal distribution
for statistical inference with the knowledge that the probabilities we calculate will
be good approximations to the true (but generally unknown) probabilities. Recall
the Normal approximations to the Binomial and Poisson distributions. See the
following chapter for sections for the tests of hypotheses and methods of estimation
such as confidence intervals. These are the techniques statistical inference applies
to real data. Using the symbol ‘∼’ to mean ‘is distributed as’, we write the Central
Limit Theorem
X̄ ∼ N (µ;
σ2
) approx. for large n.
n
Then using the properties of the Normal distribution we can say that for large n,
X̄ − µ
√ >u
P
σ/ n
can be found approximately (using say, NCST) for any specified value u without
knowing the original form of the population.
Pn
Suppose once again that Y =
i=1 Xi , where the Xi are independent rvs.
When n is large, the central limit theorem (CLT) which, roughly speaking, says
that if the Xi have mean and variance µi and σi2 respectively, then
X X
Y ∼ N(
µi ,
σi2 ).
CHAPTER 7. ESTIMATION AND TESTING
5
This result helps to explain the importance of the Normal distribution in statistics.
In particular
#
"
P
(y − µi )
.
P (Y ≤ y) ∼
=Φ
P
1
( σi2 ) 2
If we do know the form of the population and it follows a Normal distribution,
then for any sample size n > 1 it can be shown that
X̄ ∼ N (µ;
σ2
).
n
Thus
Z=
X̄ − µ
√ ∼ N (0, 1)
σ/ n
has a sampling distribution which is standard Normal for any n. As the population
standard deviation σ is often unknown, replacing it by the corresponding sample
quantity s changes the sampling distribution. However, provided the underlying
population is Normal, it can be shown that
T =
X̄ − µ
√
s/ n
has a ‘Student’s t-distribution’ with ν ‘degrees of freedom’, where ν = n − 1 (named
after W. S. Gossett, who took the pseudonym ‘Student’). The percentiles of this
distribution are given in Table 10 ; thus tν (0.25) is the 75th percentile or upper
quartile whose value for different ν is given by the third column of figures in the
main body of Table 10. These percentiles will be used extensively in the next
section for statistical inference on Normal populations.
Another distribution which arises from random samples of Normal populations
is the ‘chi-square’ distribution, whose percentage points are given in Table 8. It
can be shown that
V =
(n − 1)s2
∼ χ2n−1 ,
σ2
the chi-square distribution with n − 1 degrees of freedom, whatever the value of
X̄. Using some rather tricky distribution theory we can derive the t-distribution
mentioned above. Yet another distribution is the (Fisher) F -distribution with
percentage points in Table 12. The F - and χ2 distributions are used for statistical
inference on the variances of Normal populations as well as for wider application
in Goodness-of-Fit tests later in this chapter.
Example 7.1
Suppose we select a random sample Xi of size n from a N (µ, σ 2 ) population and
calculate the mean of the sample, X̄. Suppose σ = 70mm and I wish to estimate
the mean height in mm of a certain population on the basis of a sample of size n.
6
CHAPTER 7. ESTIMATION AND TESTING
What is the probability the sample mean is within 10mm of the population mean?
How large must n be to make
P (µ − 10 < X̄ ≤ µ + 10) = 0.9
Solution
X̄ ∼ N
σ2
µ,
n
.
µ + 10 − µ
√
P (µ − 10 < X̄ ≤ µ + 10) = Φ
70/ n
√ n
−1
= 2Φ
7
−Φ
µ − 10 − µ
√
70/ n
If n = 1,
P (µ − 10 < X̄ ≤ µ + 10) = 0.12.
If n = 10,
P (µ − 10 < X̄ ≤ µ + 10) = 0.35.
If n = 100,
P (µ − 10 < X̄ ≤ µ + 10) = 0.85.
√
n
) − 1 = 0.9.
7 √
n
) = 0.95
P (z <
√7
n
= 1.6449
7
n ≈ 133
2P (z <
△
7.2
Point estimation
One of the first tasks a statistician or an engineer undertakes when faced with
data is to try to summarize or describe the data in some manner. Some of the
statistics (sample mean, sample variance, etc.) we covered can be used as descriptive measures for our sample. In this section, we look at methods to derive and to
evaluate estimates of population parameters. There are several methods available
CHAPTER 7. ESTIMATION AND TESTING
7
for obtaining parameter estimates, we will discuss the maximum likelihood method
and the method of moments.
Typically, population parameters can take on values from a subset of the real
line. For example, the population mean can be any real number, −∞ < µ < ∞,
and the population standard deviation can be any positive real number, σ > 0.
The set of all possible values for a parameter θ is called the parameter space.
The data space is defined as the set of all possible values of the random sample
of size n. The estimate is calculated from the sample data as a function of the
random sample. An estimator is a function or mapping from the data space to the
parameter space and is denoted as
T = t(X1 , ..., Xn ).
Since an estimator is calculated using the sample alone, it is a statistic. Furthermore, if we have a random sample, then an estimator is also a random variable.
This means that the value of the estimator varies from one sample to another based
on its sampling distribution. In order to assess the usefulness of our estimator, we
need to have some criteria to measure the performance. We discuss four criteria
used to assess estimators: bias, mean squared error, efficiency, and standard error.
In this discussion, we only present the definitional aspects of these criteria.
Bias
The bias in an estimator gives a measure of how much error we have, on average,
in our estimate when we use T to estimate our parameter θ. The bias is defined
as
bias(T ) = E[T ] − θ.
If the estimator is unbiased, then the expected value of our estimator equals the
true parameter value, so E[T ] = θ.
To determine the expected value E[T ], we must know the distribution of the
statistic T . In these situations, the bias can be determined analytically. When
the distribution of the statistic is not known, then we can use special methods to
estimate the bias of T .
Mean Square Error
Let θ denote the parameter we are estimating and T denote our estimate, then the
mean squared error (MSE) of the estimator is defined as
M SE(T ) = E[(T − θ)2 ].
Thus, the MSE is the expected value of the squared error. We can write this
in more useful quantities such as the bias and variance of T . If we expand the
expected value on the right hand side of the above equation, then we have
M SE(T ) = E[(T 2 − 2T θ + θ2 )] = E[T 2 ] − 2θE[T ] + θ2 .
CHAPTER 7. ESTIMATION AND TESTING
8
By adding and subtracting (E[T ]2 ), we have the following
M SE(T ) = E[T 2 ] − (E[T ])2 + (E[T ])2 − 2θE[T ] + θ2 .
The first two terms are the variance of T , and the last three terms equal the
squared bias of our estimator. Thus, we can write the mean squared error as
M SE(T ) = E[T 2 ] − (E[T ])2 + (E[T ] − θ)2 = V ar[T ] + (bias(T ))2
Since the mean squared error is based on the variance and the squared bias, the
error will be small when the variance and the bias are both small. When T is
unbiased, then the mean squared error is equal to the variance only. The concepts
of bias and variance are important for assessing the performance of any estimator.
Standard error
We can get a measure of the precision of our estimator by calculating the standard
error. The standard error of an estimator (or a statistic) is defined as the standard
deviation of its sampling distribution:
p
SE(T ) = V (T ) = σT
To illustrate this concept, let’s use the sample mean as an example. We know that
the variance of the estimator is
V [X̄] =
1 2
σ ,
n
for large n. So, the standard error is given by
σ
SE(X̄) = σX̄ = √ .
n
If the standard deviation σ for the underlying population is unknown, then we can
substitute an estimate for the parameter. In this case, we call it the estimated
standard error:
ˆ X̄) = σˆX̄ = √S .
SE(
n
This estimate is also a random variable and has a probability distribution associated with it.
If the bias in an estimator is small, then the variance of the estimator is approximately equal to the MSE, V (T ) ≈ M SE(T ). Thus, we can also use the square
root of the MSE as an estimate of the standard error.
CHAPTER 7. ESTIMATION AND TESTING
9
Maximum Likelihood Estimation
A maximum likelihood ML estimator is that value of the parameter (or parameters) that maximizes the likelihood function of the sample. The likelihood
function of a random sample of size n from density function f (x; θ) is the joint
probability density function, denoted by
L(θ; x1 , ..., xn ) = f (x1 , ..., xn ; θ)
. This equation provides the likelihood that the random variables take on a particular value x1 , ..., x2 . Note that the likelihood function L is a function of the
unknown parameter θ, and that we allow θ to represent a vector of parameters.
If we have a random sample (independent, identically distributed random variables), then we can write the likelihood function as
L(θ) = L(θ; x1 , ..., xn ) =
n
Y
f (xi ; θ),
i=1
which is the product of the individual density functions evaluated at each or sample
point.
In most cases, to find the value θ̂ that maximizes the likelihood function, we
take the derivative of L, set it equal to 0 and solve for θ. Thus, we solve the
following likelihood equation
d
L(θ) = 0.
dθ
It can be shown that the likelihood function, L(θ), and logarithm of the likelihood
function, ln L(θ), have their maxima at the same value of θ. It is sometimes easier
to find the maximum of ln L(θ), especially when working with an exponential
function:
l(θ) = ln L(θ),
Then we solve the following equation
d
l(θ) = 0.
dθ
However, keep in mind that a solution to the above equation does not imply
that it is a maximum; it could be a minimum. It is important to ensure this is the
case before using the result as a maximum likelihood estimator.
When a distribution has more than one parameter, then the likelihood function
is a function of all parameters that pertain to the distribution. In these situations,
the maximum likelihood estimates are obtained by taking the partial derivatives
of the likelihood function (or ln L(θ)), setting them all equal to zero, and solving
the system of equations.
Example 7.2
10
CHAPTER 7. ESTIMATION AND TESTING
In this example, we derive the maximum likelihood estimators for the parameters
of the normal distribution.
Solution
We start off with the likelihood function for a random sample of size n given
by
L(θ) =
n
Y
i=1
!
n
n
1
(xi − µ)2
1
1 X
√ exp −
=
(xi − µ)2 .
exp − 2
2
2
2σ
2πσ
2σ
σ 2π
i=1
l(θ) = ln(L(θ)) = ln
"
1
2πσ 2
n2 #
"
n
1 X
(xi − µ)2
+ ln exp − 2
2σ i=1
!#
This simplifies to
n
n
n
1 X
l(θ) = − ln[2π] − ln[σ 2 ] − 2
(xi − µ)2
2
2
2σ i=1
with σ > 0 and −∞ < µ < ∞. The next step is to take the partial derivative with
respect to µ and σ 2 . These derivatives are
n
1 X
∂l
(xi − µ),
= 2
∂µ
σ i=1
and
n
n
1 X
∂l
(xi − µ)2 .
=− 2 + 4
2
∂σ
2σ
2σ i=1
We then set equations equal to zero and solve for µ and σ 2 . Solving the first
equation for µ, we get the familiar sample mean for the estimator.
n
1 X
(xi − µ) = 0
σ 2 i=1
n
X
xi = nµ
i=1
n
1X
xi = x̄
µ̂ =
n i=1
Substituting µ̂ = x̄ and solving for the variance, we get
2
n
1 X
− 2+ 4
(xi − x̄)2 = 0
2σ
2σ i=1
CHAPTER 7. ESTIMATION AND TESTING
11
n
1X
(x − x̄)2
σˆ2 =
n i=1
△
We know that the E[X̄] = µ, so the sample mean is an unbiased estimator for
the population mean. However, that is not the case for the maximum likelihood
estimate for the variance
E[σˆ2 ] =
(n − 1)σ 2
,
n
so that the maximum likelihood estimate, σˆ2 , for the variance is biased. If we
want to obtain an unbiased estimator for the variance, we simply multiply our
n
. This yields the statistic for the sample
maximum likelihood estimator by (n−1)
variance given by
n
1 X
s =
(xi − x̄)2
n − 1 i=1
2
Methods of moments
In some cases, it is difficult finding the maximum of the likelihood function. The
method of moments is one way to approach this problem. In general, we write the
unknown population parameters in terms of the population moments.
Let X1 , X2 , ..., Xn be a random sample from the probability distribution f (x).
The k th population moment is E[X k ], k = 1, 2, .... The corresponding k th sample
moment is
n
1X k
X ,
n i=1 i
k = 1, 2...
The moment estimators are found by replacing the population moments with
the corresponding sample moments.
Exercises
Exercise 7.1
In terms of a random sample of size n, from the binomial B(1, θ) distribution
with observed values x1 , ..., xn , determine the MLE θ̂ = θ̂(x) of θ ∈ (0, 1), x =
(x1 , ..., xn ).
Solution
X ∼ B(1, θ), so that xi can take values 0 or 1.
12
CHAPTER 7. ESTIMATION AND TESTING
f (xi , θ) = θxi (1 − θ)1−xi , i = 1, .., n.
L(θ) =
=θ
Pn
Qn
i=1
i=1
xi
f (xi , θ) =
(1 − θ)n−
l(θ) = ln L(θ) =
dl
dθ
=
Pn
i=1
Pn
i=1
Pn
i=1
θ
−
i=1
Pn
i=1
(n−
Pn
i=1
xi −
Pn
Pn
i=1
n
i=1
xi
i=1
xi ln θ + (n −
xi )
=0
Pn
θ − nθ +
= x̄
θxi (1 − θ)1−xi = θ
Pn
i=1
xi
Pn
(1 − θ)
i=1 (1−xi )
=
xi
1−θ
xi (1 − θ) − (n −
θ̂M L =
7.3
xi
Pn
Qn
i=1
Pn
i=1
xi ) ln(1 − θ)
xi ) θ = 0
Pn
i=1
θ=0
△
Confidence Intervals
A confidence interval allows us to make statements concerning the likely range that
a population parameter (such as the mean) lies within.
A single statistic could be used as an estimate for a population (commonly
referred to as a point estimate ). A single value, however, would not reflect any
amount of confidence in the value. This range of values is referred to as the
confidence interval.
The confidence interval is not only dependent on the number of samples collected
but is also dependent on the required degree of confidence in the range. If we wish
to make a more confident statement, we would have to make the range larger. This
required degree of confidence is based on the confidence level at which the estimate
is to be calculated. Commonly used, confidence levels include 0.9, 0.95, and 0.99.
Confidence level α
Φ−1 (α/2)
0.99 0.98 0.95 0.9
0.5
2.58 2.33 1.96 1.645 0.6745
The confidence limits for the population mean are given by
σ
x̄ ± Φ−1 (α/2) √
n
In general, the population standard deviation σ is unknown, so that to obtain the
confidence limits, we use the estimator s2 . In this case we use the t-distribution to
obtain confidence intervals. In general the confidence limits for population means
are given by
s
x̄ ± t−1
n−1 (α/2) √
n
CHAPTER 7. ESTIMATION AND TESTING
13
For n > 30, Φ−1 (α/2) and t−1
n−1 (α/2) are practically equal.
Suppose that the statistic is the proportion of success a sample of size n > 30
drawn from a binomial population in which p is the probability of successes. Then
the confidence limits for p are given by:
r
p̂(1 − p̂)
p̂ ± Φ−1 (α/2)
n
A confidence interval for the variance of population σ 2 is
(
(n − 1)s2
(n − 1)s2
,
)
χ2n−1 (α/2) χ2n−1 (1 − α/2)
Example 7.3
Suppose there are two political parties A and B. An opinion poll is carried out
based on 1,000 individuals, 0.53% voted for candidate A. Find cinfidence interval
for a polling results.
Solution
r
p̂(1 − p̂)
n
0.53 ± 1.96 × 0.016
[0.5; 0.53]
p̂ ± Φ−1 (α/2)
△
Exercises
Exercise 7.2
The stray-load loss (in watts) for a certain type of induction motor, when the line
current is held at 10 amps for a speed of 1,500 rpm, is a r.v. X ∼ N (µ, 9).
1. Compute a 99% confidence interval for µ when n = 100 and = 58.3.
2. Determine the sample size n, if the length of the 99% confidence interval is
required to be 1.
Solution
X ∼ N (µ, 9)
CHAPTER 7. ESTIMATION AND TESTING
14
1. x̄ ± Φ−1 (α/2) √σn
3
58.3 ± 2.58 ∗ √100
= 58.3 ± 0.7740
99 % confidence interval is (57.5260, 59.0740)
2. Φ−1 (α/2) √σn = 0.5
2.58 ∗ √3n = 0.5
√
n = 15.48
n = 240
△
7.4
7.4.1
Hypothesis Testing
Introductions
Very often in practice we need to make a decision about population based on the
information from the sample. For example, we may wish to decide if a new serum
more effectively cures the dieses, or one procedure is better then other.
In attempt to reach a decision, it is useful to make assumptions or guesses
about population involved.
A hypothesis test determines whether the data collected supports a specific
claim.
A statistical test of hypothesis consist of five parts:
1. The null hypothesis, denoted by H0
2. The alternative hypothesis, denotes H1
3. The test statistics: a single number calculated from the sample
4. The p-value: a probability taken from distribution which approximates test
statistics.
5. The conclusion: there is or there is no evidence to reject H0
Null hypothesis H0 is a claim that a particular population parameter (e.g.mean)
equals a specific value. A hypothesis test will either reject or not reject the null
hypothesis using the collected data.
Alternative hypothesis H1 is the conclusion that we would be interested in
reaching if the null hypothesis is rejected. There are three options: not equal to,
greater than or less than.
Once the null hypothesis and the alternative hypothesis have been described,
it is now possible to assess the hypotheses using the data collected. First, the
statistic of interest from the sample is calculated. Next, a hypothesis test will look
at the p-value.
A p-value is the probability of getting the recorded value or a more extreme
value. This is actually the area under probability density function to the right
(for positive values) or to the left (for negative values) of the test statistics. To
CHAPTER 7. ESTIMATION AND TESTING
15
calculate a p-value, use the score calculated using the test statistics and look up
the score on the standardized normal distribution.
To interpret p-values in a consistent way, we adopt a convention which gives
the following interpretations:
p > 0.1 very weak or no evidence against the null hypothesis
0.05 < p < 0.1 slight or weak evidence against the null hypothesis
0.01 < p < 0.05 moderate evidence against the null hypothesis
0.001 < p < 0.01 strong evidence against the null hypothesis
p < 0.001 very strong or overwhelming evidence against the null
hypothesis.
However the exact interpretation and appropriate action to be taken must
obviously vary according to the problem at hand.
Errors
Since a hypothesis test is based on a sample and samples vary, there exists the
possibility for errors. There are two potential errors and these are described as:
Type I Error: In this situation the null hypothesis is rejected when it really
should not be. These errors are minimized by setting the p-value to be high.
Type II Error: In this situation the null hypothesis is not rejected when
it should have been. These errors are minimized by increasing the number of
observations in the sample.
7.4.2
Single sample
Consider we have a sample which comes from population described by normal
distribution with mean µ and variance σ 2 . Suppose that we have a knowledge of
the population variance σ, or we have large number of observations and Theorem
of large numbers can be applied. We are interested if the data from the sample
support a hypothesis that population mean µ takes a certain value µ0 .
CHAPTER 7. ESTIMATION AND TESTING
16
Independent samples, population variances known, or sample size n ≥ 30
1. Null hypothesis: H0 : µ = µ0
2. Alternative hypothesis:
One-tailed test
H 1 : µ > µ0
( or H1 : µ < µ0 )
Two-tailed test
H1 : µ 6= µ0
3. Test statistic:
x̄ − µ0
√
σ/ n
If σ is unknown and n > 30, substitute the sample standard deviation s for
population standard deviation σ.
Z=
4. p-value from normal distribution:
One-tailed test
Two-tailed test
p = P (z ≥ Z)
p = 2 × P (z ≥ Z)
( or p = P (z ≤ Z) )
5. Conclusion - based on calculated p-value
Example*
Burning rate of a solid propellant used to power aircrew escape systems is a random
variable that can be described by a normal distribution with the unknown mean
and the standard deviation σ = 2.5 centimeters per second. Given that a mean
from 10 samples is 48.5 centimeters per second, test the hypothesis whether or not
the mean burning rate is 50 centimeters per second.
Solution
H0 : µ = 50
H1 : µ 6= 50
Two tailed test.
x̄−µ
√
√ 0 = 48.5−50
= −1.90
Test statistics: Z = σ/
n
2.5/ 10
p = P (|Z| > 1.9) = 2 ∗ (1 − P (Z < 1.9)) = 0.0574
There is a weak evidence against H0 , i.e. that mean burning rate is 50 centimeters
per second.
△
Independent samples, population variances unknown, n < 30
1. Null hypothesis: H0 : µ = µ0
2. Alternative hypothesis:
One-tailed test
H 1 : µ > µ0
( or H1 : µ < µ0 )
Two-tailed test
H1 : µ 6= µ0
17
CHAPTER 7. ESTIMATION AND TESTING
3. Test statistic:
T =
x̄ − µ0
√
s/ n
4. p-value from Student’s t-distribution, with n − 1 degrees of freedom:
One-tailed test
Two-tailed test
p = P (tn−1 ≥ T )
p = 2 × P (tn−1 ≥ T )
( or p = P (tn−1 ≤ T ))
5. Conclusion - based on calculated p-value
Example 7.4
The increased availability of light materials with high strength has revolutionized
the design and manufacture of golf clubs, particularly drivers. Clubs with hollow
heads and very thin faces can result in much longer tee shots, especially for players
of modest skills. This is due partly to the spring-like effect that the thin face
imparts to the ball. Firing a golf ball at the head of the club and measuring the
ratio of the outgoing velocity of the ball to the incoming velocity can quantify this
spring-like effect. The ratio of velocities is called the coefficient of restitution of the
club. An experiment was performed in which 15 drivers produced by a particular
club maker were selected at random and their coefficients of restitution measured.
In the experiment the golf balls were fired from an air cannon so that the incoming
velocity and spin rate of the ball could be precisely controlled. It is of interest
to determine if there is evidence to support a claim that the mean coefficient of
restitution exceeds 0.82. The observations follow:
0.8411
0.8580
0.8042
0.8191
0.8532
0.8730
0.8182
0.8483
0.8282
0.8125
0.8276
0.8359
0.8750
0.7983
0.8660
Solution
The sample mean and sample standard deviation are x̄ = 0.83725 and s = 0.02456.
Since the objective of the experimenter is to demonstrate that the mean coefficient
of restitution exceeds 0.82, a one-sided alternative hypothesis is appropriate.
H0 : µ = 0.82
H1 : µ > 0.82
We want to reject H0 if the mean coefficient of restitution exceeds 0.82.
The test statistic is
x̄ − µ0
√
T =
s/ n
Computations:
T =
0.83725 − 0.82
√
= 2.72
0.02456/ 15
CHAPTER 7. ESTIMATION AND TESTING
18
Conclusions: p = P (t14 > 2.72) = 0.0086
No evidence to reject null hypothesis.
△
Test Hypothesis Concerning a Population Variance
1. Null hypothesis: H0 : σ 2 = σ02
2. Alternative hypothesis:
One-tailed test
H1 : σ 2 > σ02
( or H1 : σ 2 < σ02 )
3. Test statistic:
X2 =
Two-tailed test
H1 : σ 2 6= σ02
(n − 1)s2
σ02
4. p-value from chi-square distribution, with n − 1 degrees of freedom:
One-tailed test
Two-tailed test
P (χ2n−1 ≥ X 2 )
2 × P (χ2n−1 ≥ X 2 )
( or P (χ2n−1 ≤ X 2 ) )
5. Conclusion - based on calculated p-value
Example*
An automatic filling machine is used to fill bottles with liquid detergent. A random
sample of 20 bottles results in a sample variance of fill volume of s2 = 0.0153 (fluid
ounces)2 . If the variance of fill volume exceeds 0.01 (fluid ounces)2 , an unacceptable
proportion of bottles will be underfilled or overfilled. Is there evidence in the sample
data to suggest that the manufacturer has a problem with underfilled or overfilled
bottles?
Solution
H0 : σ 2 = 0.01
H1 : σ 2 > 0.01
One-tailed test.
2
Test statistics: X 2 = (n−1)s
= 19∗0.153
= 29.07
0.01
σ02
2
p = P (χ19 > 29.07) > 0.05
No evidence to reject H0 , i.e. variance is not more then 0.01 (fluid ounces)2 .
△
CHAPTER 7. ESTIMATION AND TESTING
7.4.3
19
Comparing two samples
Two different situations are possible: independent samples and paired samples.
Important assumption is that he samples are taken from normal distributions,
the first of size nA from N (µA , σA2 ) and the second one of size nB from N (µB , σB2 ).
The means of the samples X̄A and X̄B can be calculated and used to compare
means of populations µA and µB .
Independent samples, population variances known, or if unknown, nA ≥
30 and nB ≥ 30
1. Null hypothesis:
H0 : (µA − µB ) = µ0
where µ0 is some specific difference that you wish to test. For many tests,
you will hypothesize that there is no difference between µA and µB ; that is
H0 : µA = µB .
2. Alternative hypothesis:
One-tailed test
Two-tailed test
H1 : (µA − µB ) > µ0
H1 : (µA − µB ) 6= µ0
( or H1 : (µA − µB ) < µ0 )
3. Test statistic:
Z=
(x̄A − x̄B ) − µ0
q 2
2
σA
σB
+
nA
nB
If σA2 and σB2 are unknown, nA > 30 and nB > 30, substitute the sample
variances s2A and s2B for σA2 and σB2 , respectively.
4. p-value from normal distribution:
One-tailed test
Two-tailed test
p = P (z ≥ Z)
p = 2 × P (z ≥ Z)
( or p = P (z ≤ Z) )
5. Conclusion - based on calculated p-value
Example*
A product developer is interested in reducing the drying time of a primer paint.
Two formulations of the paint are tested; formulation 1 is the standard chemistry,
and formulation 2 has a new drying ingredient that should reduce the drying
time. From experience, it is known that the standard deviation of drying time is
8 minutes, and this inherent variability should be unaffected by the addition of
the new ingredient. Ten specimens are painted with formulation 1, and another 10
specimens are painted with formulation 2; the 20 specimens are painted in random
order. The two sample average drying times are x̄1 = 121 minutes and x̄2 = 112
CHAPTER 7. ESTIMATION AND TESTING
20
minutes, respectively. What conclusions can the product developer draw about
the effectiveness of the new ingredient?
Solution
H0 : µ1 = µ2
H 1 : µ 1 > µ2
One-tailed test.
Test statistics: Z =
x̄1 −x̄2 −0
r
2
σ1
σ2
+ n2
n1
2
=
121−112
q
2
82
+ 810
10
= 2.52
p = P (Z > 2.52) = 1 − P (Z < 2.52) = 0.0059
There is an evidence to reject H0 , i.e. new drying ingredient can reduce the drying
time.
△
If independent samples have unknown population variances, but we can assume
that these unknown population variances are equal, we construct pooled variance
estimator, which is weighted average of two unbiased estimators s2A and s2B :
Independent samples with unknown, but equal variances σA = σB = σ,
with nA ≤ 30 and nB ≤ 30, (pooled variances)
1. Null hypothesis: H0 : (µA −µB ) = µ0 , where µ0 is some specific difference that
you wish to test. If no difference between µA and µB ; than is H0 : µA = µB .
2. Alternative hypothesis:
One-tailed test
Two-tailed test
H1 : (µA − µB ) > µ0
H1 : (µA − µB ) 6= µ0
( or H1 : (µA − µB ) < µ0 )
3. Test statistic:
T =
where s2 =
(nA −1)s2A +(nB −1)s2B
nA +nB −2
(x̄A − x̄B ) − µ0
q
1
+ n1B
s
nA
4. p-value from Student’s t-distribution, with nA + nB − 2 degrees of freedom:
One-tailed test
Two-tailed test
P (tnA +nB −2 ≥ T )
2 × P (tnA +nB −2 ≥ T )
( or P (tnA +nB −2 ≤ T ) )
5. Conclusion - based on calculated p-value
Example*
Suppose that we have obtained X̄ = 80.02 and sx = 0.024 from the control group
of size nx = 13, and Ȳ = 79.98 and sy = 0.031 from the experimental group of size
CHAPTER 7. ESTIMATION AND TESTING
21
ny = 8 and we assume that σx2 = σy2 . Test the hypothesis that the means for two
groups are equal.
Solution
H0 : µx = µy
H1 : µx 6= µy
Two-tailed test.
s2 =
(nx −1)s2x +(ny −1)s2y
nx +ny −2
s = 0.027
Test statistics: T =
=
12∗0.0242 +7∗0.0312
13+8−2
80.02−79.98
√1 1
+8
0.027 13
= 0.000729
= 3.3
p = 2 ∗ P (t19 > 3.3) = 2 ∗ (1 − P (t19 < 3.3)) = 2 ∗ (1 − 0.9981) = 0.0035
There is an evidence against H0 and the two population means are significantly
different.
△
Before performing the above test for comparison of the means of two samples,
we need to check if there is an evidence that the variances are equal:
Test Hypothesis Concerning the Equality of Two Population Variances
1. Null hypothesis:
H0 : σA2 = σB2
2. Alternative hypothesis:
One-tailed test
H1 : σA2 > σB2
( or H1 : σA2 < σB2 )
Two-tailed test
H1 : σA2 6= σB2
Small-sample statistical test for the difference between two population means:
3. Test statistic:
F =
σA2
σB2
4. p-value from Fisher distribution, with dfA = nA − 1 and dfB = nB − 1 degrees
of freedom:
One-tailed test
Two-tailed test
p = P (FdfA ,dfB ≥ F )
p = 2 × P (FdfA ,dfB ≥ F )
( or p = P (FdfA ,dfB ≥ 1/F ) )
5. Conclusion - based on calculated p-value
CHAPTER 7. ESTIMATION AND TESTING
22
Example*
Oxide layers on semiconductor wafers are etched in a mixture of gases to achieve
the proper thickness. The variability in the thickness of these oxide layers is a
critical characteristic of the wafer, and low variability is desirable for subsequent
processing steps. Two different mixtures of gases are being studied to determine
whether one is superior in reducing the variability of the oxide thickness. Twenty
wafers are etched in each gas. The sample standard deviations of oxide thickness
are s1 = 1.96 angstroms and s2 = 2.13 angstroms, respectively. Is there any
evidence to indicate that either gas is preferable?
Solution
H0 : σ12 = σ22
H1 :21 6= σ22
Two- tailed test.
2
σ2
3.84
Test statistics: F = 21 = 1.96
2 = 4.54 = 0.85
2.13
2
p = 2 ∗ P (F19,19 > 0.85)
From Table 12: F12,19 (0.1) = 1.912, F24,19 (0.1) = 1.787 so that P (F19,19 > 0.85) >
0.1 and p > 0.2.
There is no evidence against H0 , i.e. to indicate that either gas results in a smaller
variance of oxide thickness.
△
If the above test indicates that there is an evidence to reject null hypothesis
about equality of the variances, following strategy should be adopted:
Independent samples with unknown variances
1. Null hypothesis:
H0 : (µA − µB ) = µ0
where µ0 is some specific difference that you wish to test. If no difference
between µA and µB ; then is H0 : µA = µB .
2. Alternative hypothesis:
One-tailed test
Two-tailed test
H1 : (µA − µB ) > µ0
H1 : (µA − µB ) 6= µ0
( or H1 : (µA − µB ) < µ0 )
3. Test statistic:
T =
(x̄A − x̄B ) − µ0
q 2
sA
s2
+ nBB
nA
4. p-value is found from Student’s t-distribution, with ν ∗ degrees of freedom.
ν∗ =
(vA + vB )2
2
vA
nA −1
+
2
vB
nB −1
23
CHAPTER 7. ESTIMATION AND TESTING
where vA =
s2A
nA
and vB =
s2B
nB
One-tailed test
Two-tailed test
p = P (tnA +nB −2 ≥ T )
p = 2 × P (tnA +nB −2 ≥ T )
( or p = P (tnA +nB −2 ≤ T ) )
5. Conclusion - based on calculated p-value
7.4.4
Paired Observations
For paired comparison, the same individuals/items from the sample are subjected
to two different treatments A and B. Paired comparison data is analysed by
considering the differences of the pairs of observations. Thus, if the observations
are represented by the random variables XA and XB , we consider the derived
random variable X = XA − XB . It is easer to present the results in the following
table:
1
..
.
A
XA1
..
.
B
XB1
..
.
X
X1 = XA1 − XB1
..
.
X2
X12
..
.
n
XAn
XBn
Xn = XAn − XBn
Xn2
The mean of the distribution of X is µ which equals µA − µB . We interested in
hypothesis that µA and µB differ by a amount µ0 . Most probably, the variance σ 2
of differences is unknown, but we can calculates variance s2 of X from the data.
√ 0 ∼ tn−1
Then the statistics T = X̄−µ
s/ n
A Paired Samples Test
1. Null hypothesis: H0 : X̄ = µ0 .
2. Alternative hypothesis:
One-tailed test
H1 : X̄ > µ0
( or H1 : X̄ < µ0 )
3. Test statistic:
T =
Two-tailed test
H1 : X̄ 6= µ0
x̄ − µ0
√
s/ n
where n is a number of paired differences, x̄ is mean of sample differences, s
standard deviation of the sample differences.
4. p-value from Student’s t-distribution, with n − 1 degrees of freedom:
One-tailed test
Two-tailed test
p = P (tn−1 ≥ T )
p = 2 × P (tn−1 ≥ T )
( or p = P (T ≤ tn−1 ) )
5. Conclusion - based on calculated p-value
CHAPTER 7. ESTIMATION AND TESTING
24
Example*
The journal Human Factors (1962, pp. 375-380) reports a study in which n = 14
subjects were asked to parallel park two cars having very different wheel bases and
turning radii. The time in seconds for each subject was recorded and is given in
table below:
Subject Configuration 1 Configuration 2 Difference
1
37.0
17.8
19.2
2
25.8
20.2
5.6
3
16.2
16.8
-0.6
4
24.2
41.4
-17.2
5
22.0
21.4
0.6
33.4
38.4
-5.0
6
7
23.8
16.8
7.0
8
58.2
32.2
26.0
9
33.6
27.8
5.8
24.4
23.2
1.2
10
11
23.4
29.6
-6.2
12
21.2
20.6
0.6
13
36.2
32.2
4.0
14
29.8
53.8
-24.0
Test if the first configuration of wheel bases and turning radii gives a faster car
parking time.
Solution
H0 : X̄ = 0
H1 : X̄ > 0
One-tailed
test.
P
= 1.21
x̄ = Pnxi = 19.2+5.6−0.6−7.2+0.6−5.0+7.0+26.0+5.8+1.2−6.2+0.6+4.0−24.0
14
2
(x
−x̄)
i
s2 = n−1
= 160.78
s = 12.68
1.21√
= 0.3441
Test statistics: T = s/x̄√n = 12.68/
13
p = P (T13 > 0.3441)
From Table 10: T13 (0.4) = 0.2586 and T13 (0.3) = 0.5375 so that 0.3 < p < 0.4.
No evidence to eject H0 , i.e. both configurations give equal mean parking times.
△
Exercises
Exercise 7.3
Two random samples were independently drawn from two populations, A and B.
Is there evidence in the following data to indicate a difference in the population
means?
CHAPTER 7. ESTIMATION AND TESTING
Sample
PSize
n
Pni=1 x2i
i=1 xi
Mean
Variance
S.E.Mean
25
A
B
6
5
297
322
16103 21978
49.5
64.4
280.3 310.3
6.84
7.88
Solution
First we want to test hypothesis if variances of two populations are equal.
H0 : σA = σB
H1 : σA 6= σB
Two-tailed test.
σ2
= 0.9033
Test statistics F = σA2 = 280.3
310.3
B
p = 2 ∗ P (F5,4 > 0.9033)
From Table 12:
F4,5 (0.1) = 3.520 and F4,5 (0.05) = 5.192, this indicates that P (F5,4 > 0.9033) > 0.1
and p > 0.2.
There is no evidence to reject H0 , i.e. the variances of two populations are equal.
Now we can test hypothesis if the means of two populations are equal (populations have unknown but equal variances.
H0 : µA = µB
H1 : µA 6= µB
Two-tailed test.
(n −2)s2 +(nB −1)s2B
= 5∗280.3+4∗310.3
= 293.6
s2 = A nAA+nB −2
6+5−2
s = 17.14
√
Test statistics: T = √ x̄A −x̄B
= 59.5−64.4
= −1.5
s
1/nA +1/nB
17.14
1/6+1/5
p = 2∗ P (t9 < −1.5) = 2∗ (1 − P (t9 < 1.5)) = 2∗ (1 − 0.9161) = 2∗ 0.0839 = 0.1678
No evidence to reject H0 , i.e. the means of two populations are the same.
△
Exercise 7.4
The heights in inches of m male students and n female students were obtained. It
is well-known that males tend to be taller on average than females. However, it is
of interest to estimate the difference in mean heights between the sexes. General
anthropometric considerations suggest that male heights and female heights are
approximately Normal and that the population variances differ slightly. In the
general population the difference of mean male and female heights is 5.5 inches.
Thus, for our example it is of interest to test H0 : µM − µF = 5.5 against
H1 : µM − µF 6= 5.5. In this case we need to modify the t-statistic since we are
testing µM − µF = 5.5, rather than µM − µF = 0. Here
CHAPTER 7. ESTIMATION AND TESTING
Male
m = 41
x̄ = 69.2
sx = 2.66
26
Female
n = 17
ȳ = 66.7
sy = 2.34
Solution
H0 : µx − µy = 5.5
H1 : µx − µy 6= 5.5
Two-tailed test.
Independent samples with unknown variances
(x̄−ȳ)−µ0
Test statistic: T = r
=
2
2
s
sx
+ ny
nx
x
= √ 69.2−66.7−5.5
2
2
2.66 /41+2.34 /17
vx =
ν∗ =
2
s2x
= 2.66
nx
41
(vx +vy )2
2
v2
vx
+ n y−1
nx −1
y
= −4.26
2
s2y
= 0.3221
= 2.34
ny
17
2
(0.1726+0.3221)
= 33.85
0.17262 /40+0.32212 /16
= 0.1726, vy =
=
p = 2 ∗ P (t34 < −4.26) = 2 ∗ P (t34 > 4.26)
From Table 10: P (t34 > 3.601) = 0.0005 so we have that p < 0.001.
There is a strong evidence against H0 , i.e. the difference of mean male and female
heights is not 5.5 inches.
△
Exercise 7.5
Sixteen patients sampled at random were assigned as matched pairs to two treatments, treatment A being assigned to a random member of each pair. A response
was measured and the data were:
A
B X (difference)
14.0 13.2
+0.8
5.0 4.7
+0.3
8.6 9.0
−0.4
11.6 11.1
+0.5
12.1 12.2
−0.1
5.3 4.7
+0.6
8.9 8.7
+0.2
10.3 9.6
+0.7
Is there an evidence for a difference in means?
Solution
H0 : X̄ = 0
H1 : X̄ 6= 0
27
CHAPTER 7. ESTIMATION AND TESTING
Two-tailed
test.
P
xi
= 0.3250
x̄ = Pn = 0.8+0.3−0.4+0.5−0.1+0.6+0.2+0.7
8
(xi −x̄)2
2
s = n−1 = 0.176
s = 0.413
0.325√
= 2.2
Test statistics: T = s/x̄√n = 0.413/
8
p = 2 ∗ P (t7 > 2.2) = 2 ∗ (1 − P (t7 < 2.2)) = 2 ∗ (1 − 0.9681) = 0.0638
There is a weak evidence against H0 , i.e. that there is a difference in means of two
treatments.
△
7.5
Chi-square test
The chi-square test allows an analysis of whether there is a relationship between
two categorical variables. The chi-square test is used in two similar but distinct
circumstances:
• For estimating how closely an observed distribution matches an expected
distribution - we’ll refer to this as the goodness-of-fit test.
• For estimating whether two random variables are independent.
Chi- square is always right - tailed test.
7.5.1
The Goodness-of-Fit Test
One of the more interesting goodness-of-fit applications of the chi-square test is
to examine issues of fairness and cheating in games of chance, such as cards, dice,
and roulette. For example, if the die being used is fair, then the chance of any
particular number coming up is the same: 1 in 6. However, if the die is loaded,
then certain numbers will have a greater likelihood of appearing, while others will
have a lower likelihood.
Consider, that the data comes from discrete random distribution with N categories, and is given by the following Table:
x
Observed frequencies
x1
O1
x2
O2
......
.....
xN
ON
Total
n
The key idea of the chi-square test is a comparison of observed frequencies Ok
and expected frequencies Ek . We calculate expected frequencies Ek multiplying the
total number of samples n by probability distribution function of the distribution
P (xk ). The sum of expected frequencies should be equal to total number of elements n: we can not compare Ok and Ek unless they both represent the same total
collection of items. Then we can add two rows to the table already given; these
rows will contain the expected probabilities and frequencies. Expected frequencies
should not necessary be integer numbers.
28
CHAPTER 7. ESTIMATION AND TESTING
x
x1
x2
......
Observed freq.
O1
O2
.....
Expected prob.
P (x1 )
P (x2 )
.....
Expected freq. E1 = nP (x1 ) E2 = nP (x2 ) .....
xN
Total
ON
n
P (xN )
1
EN = nP (xN )
n
If the data comes from continuous random distribution with probability density
function f (x), then classes are defined as the intervals [xk , xk+1 ] or inequalities
x ≥ xN :
x
x ≤ x1
[x1 , x2 ]
...
Observed freq.
...
2
R xO1 1
RO
x
f (x)dx P2 = x12 f (x)dx ...
Expected prob. P1 = −∞
Expected freq.
E1 = nP1
E2 = nP2
.....
x ≥ xN
Total
n
RO∞N
PN = xN f (x)dx
n
EN = nPN
n
Test statistic is given by
X2 =
X (Ok − Ek )2
k
Ek
∼ χ2ν
and has Chi-square distribution with the degrees of freedom ν = N − 1-number of
parameters estimated. In the
PN above case, number of parameters estimated is 1, as
we had a restriction that k=1 Ek = n.
Important rule for the chi-square test is that the values of Ek should not be
allowed to fall below 5. This can be ensured by grouping together the top tail
of distribution and treating say x > xl as one class, or merging some other classes
together. Then number of degrees of freedom is ν = N̂ − 1-number of parameters
estimated, where N̂ is the number of classes after merging.
We find p-value for the statistics calculated in the formula above from a standard set of tables. Obviously, in ideal case, if observed frequencies are equal to
expected frequencies, statistics X 2 takes value near 0 and p-value is about 1 giving
no evidence to reject H0 .
A goodness-of-fit test with chi-square
1. Establish null hypotheses that frequencies follow a particular distribution with
defined probability P (x).
2. Calculate expected frequency valuesR for each category of the table Ek =
x
nP (xk ) (for discrete r.v.) or Ek = n xkk+1 f (x)dx (for continuous r.v.).
3. Calculate chi-square statistic
2
X =
N
X
(Ok − Ek )2
k=1
Ek
4. Assess p-value of the statistic from chi-square distribution with ν = N − 1 − 1
df:
p = P (χν ≥ X 2 )
29
CHAPTER 7. ESTIMATION AND TESTING
5. Finally, decide whether to accept or reject the null hypothesis.
Example 7.5
The number of defects in printed circuit boards is hypothesized to follow a Poisson
distribution. A random sample of n = 60 printed boards has been collected, and
the following number of defects observed.
Number of Observed
0
1
2
3
Defects Frequency
32
15
9
4
Solution
The mean of the assumed Poisson distribution in this example is unknown and
must be estimated from the sample data. The estimate of the mean number of
defects per board is the sample average, that is, (32×0+15×1+9×2 = 4×3)/60 =
0.75. From the Poisson distribution with parameter 0.75, we may compute pi, the
theoretical, hypothesized probability associated with the ith class interval. Since
each class interval corresponds to a particular number of defects, we may find the
pi as follows:
e−0.75 (0.75)0
= 0.527
0!
e−0.75 (0.75)1
p2 = P (X = 1) =
= 0.354
1!
e−0.75 (0.75)2
p3 = P (X = 2) =
= 0.133
2!
e−0.75 (0.75)3
= 0.041
p4 = P (X = 3) =
3!
The expected frequencies are computed by multiplying the sample size n = 60
times the probabilities pi . That is, Ei = npi . The expected frequencies follow:
p1 = P (X = 0) =
Number of Expected Defects
0
1
0.133
≥3
Probability
0.472
0.354
7.98
0.041
Frequency
28.32
21.24 2
2.46
Since the expected frequency in the last cell is less than 3, we combine the last two
cells:
Number of Expected Defects
0
1
23
Frequency
32
32
10.44
Frequency
28.32
21.24 ≥ 2
30
CHAPTER 7. ESTIMATION AND TESTING
The chi-square test statistic will have k − p − 1 = 3 − 1 − 1 = 1 degree of freedom,
because the mean of the Poisson distribution was estimated from the data.
χ2 =
(32 − 28.32)2 (15 − 21.24)2 (13 − 10.44)2
+
+
= 2.94.
28.32
21.24
10.44
Conclusion: p = P (χ1 ≥ 2.94) = 0.0886
No evidence against H0 .
△
7.5.2
Testing Independence
The other primary use of the chi-square test is to examine whether two variables
are independent or not. It is important to keep in mind that the chi-square test
only tests whether two variables are independent. It cannot address questions of
which is greater or less.
Suppose that nc characteristics are observed on each of n members of a sample,
and that each characteristic is classified into nr types, i.e there are nc classes of the
first variable and nr classes of the second variable. A summary table, also called
contingency table, would be drawn up, where each cell of the table would give
the number of samples, who have particular characteristic and particular type.
Variable II
R1
..
.
C1
O1,1
...
Rnr Onr ,1
Total
c1
Variable I
...
...
...
Cnc
O1,nc
Total
r1
...
Onr ,nc
cnc
...
rnr
n
As with the goodness-of-fit example described earlier, the key idea of the chisquare test for independence is a comparison of observed and expected values. How
many of something were expected and how many were observed in some process?
In the case of tabular data, however, we usually do not know what the distribution
should look like. Rather, in this use of the chi-square test, expected values are
calculated based on the row and column totals from the table. The expected value
for each cell of the table can be calculated using the following formula:
Ek,j =
rk c j
n
where rk is the k th row total count, cj is the j th column total count and n is the
total observations in the sample. The first step, then, in calculating the chi-square
statistic in a test for independence is generating the expected value for each cell of
the table. Again, they should not be necessarily integers. This gives an alternative
table for an expected frequencies:
31
CHAPTER 7. ESTIMATION AND TESTING
Variable II
C1
E1,1
R1
..
.
Variable I
...
...
Rnr Enr ,1
Total
c1
...
...
Cnc
E1,nc
Total
r1
...
Enr ,nc
cnc
...
rnr
n
With these sets of figures, we calculate the chi-square statistic:
X2 =
X X (Ok,j − Ek,j )2
Ek,j
j
k
which has Chi-square distribution with the degrees of freedom ν = (nr −1)(nc −1),
where nr and nc is number of rows and columns, respectively. We then find p-value
for the statistics calculated in the formula above from a standard set of tables.
Independence test with chi-square
1. Establish null hypotheses that variables are independent.
2. Calculate expected frequency values for each cell of the table Ek,j =
rk cj
.
n
3. Calculate chi-square statistic
X X (Ok,j − Ek,j )2
X =
Ek,j
j
k
2
4. Assess p-value of the statistic from chi-square distribution with ν = (nc −
1)(nr − 1) degree of freedom:
p = P (χ2ν ≤ X 2 )
5. Finally, decide whether to accept or reject the null hypothesis.
Example*
The following contingency table relates the
cination status of pertussis patients. Test
status are independent.
Illness duration in days
< 30
31-60
> 61
Not vacc.
45
104
77
Vacc.
64
64
45
109
168
122
Solution
Expected frequencies:
duration of illness in days to the vacif duration of illness and vaccination
226
173
399
32
CHAPTER 7. ESTIMATION AND TESTING
E11 =
226∗109
399
= 61.74
E12 =
226∗168
399
= 95.16
E13 =
226∗122
399
= 69.10
E21 =
173∗109
399
= 47.26
E22 =
173∗168
399
= 72.84
E32 =
173∗122
399
= 47.26
Illness duration in days
< 30
31-60
> 61
Not vacc. 61.74
95.16
69.10 226
Vacc.
47.26
72.84
47.26 173
109
168
122
399
P (Ok −Ek )2
(45−61.74)2
2
Test statistics: X =
+
=
Ek
61.74
(64−47.26)2
47.26
(64−72.84)2
72.84
(104−95.16)2
95.16
+
(77−69.10)2
69.10
+
(45−47.46)2
47.26
+
+
= 14.45
Degrees of freedom ν = (3 − 1)(2 − 1) = 2
p = P (χ22 > 14.45)
From Table 8: χ22 (0.001) = 13.82 and χ22 (0.0005) = 15.2 so that 0.0005 < p < 0.001
There is a strong evidence against H0 , i.e. duration of illness depend on the vaccination status.
△
Exercises
Exercise 7.6
In 116 randomly selected families with two children, 42 have no girls, 52 have one
girl and only 22 have two girls. Assuming births of either sex are equally likely,
do these data conflict with the hypothesis that the sexes of successive births are
independent?
Solution
H0 : ’The sexes of successive births are independent’
H1 : ’the sexes of successive births are not independent’
If the hypothesis is true, then the number of girls in any family of two children
follows a Binomial distribution B(2, 1/2). We can construct table of frequencies:
CHAPTER 7. ESTIMATION AND TESTING
33
Number of girls x
0
1
2
Total
Observed frequency Ok
42
52
22
116
Probability P (X = xk )
0.25 0.5 0.25
1
Expected frequency Ek = nP (X = xk ) 29
58
29
116
P
2
2
2
(52−58)
(22−29)
(42−29)
O
−E
Test statistics: X 2 = k kEk k = 29 + 58 + 29 = 8.14
p = P (χ22 > 8.14)
From Table 8 : χ22 (0.025) = 7.378 and χ22 (0.001) = 9.210 so that 0.010 < p < 0.025
There is a moderate evidence against H0 , i.e. that sexes of successive births are
independent.
△
Exercise 7.7
The number of accidents in a month is observed over a period of ten years. Test
if the data follow a Poisson distribution. The data are
Number of Observed
Expected
Accidents Frequency Probability Frequency
k
Ok
Ek
0
41
0.30119
36.14
1
40
0.36144
43.37
2
22
0.21686
26.02
3
10
0.08674
10.41
4
6
0.02602
3.12
5
0
0.00625
0.75
6
1
0.00125
0.15
7 or more
0
0.00025
0.03
Total
120
1.00000
120
Solution
P
2
2
2
2
k
= (41−36.14)
+ (40−43.47)
+ (22−26.02)
+ (17−14.4)
=
Test statistics: X 2 = k OkE−E
36.14
43.47
26.02
14.4
k
1.198
Degrees of freedom ν = 4 − 1 = 3
p = P (χ23 > 1.198
From Table 8: χ23 (0.8) = 1.005 and χ23 (0.7) = 1.424 so that 0.7 < p < 0.8.
There is no evidence against H0 , i.e. the number of accidents follow Poisson
distribution.
△
Exercise 7.8
The times to failure of 500 electric have been recorded as follows:
34
CHAPTER 7. ESTIMATION AND TESTING
Time (hours), x
0-50
50-100
100-150
150-200
200-250
250-300
300-350
350-400
Frequency
208
112
75
40
30
18
11
6
Test whether these data follow an exponential distribution.
Solution
We need an estimate for Exponential distribution parameter λ̂ =
To estimate mean we take mid-points of each interval:
= 47500
= 95
x̄ = 25∗208+75∗112+125∗75+175∗40+225∗30+275∗18+325∗11+375∗6
500
500
λ̂ = 1/95
1
x̄
Cumulative distribution function for Exponential distribution: F (x) = 1 − e−λ̂x .
E1 = 500 ∗ P (0 < X < 50) = 500 ∗ (F (50) − F (0)) = 500 ∗ (e−0/95 − e−50/95 ) =
500 ∗ 0.4092 = 204.6
E2 = 500 ∗ P (50 < X < 100) = 120.9
E3 = 500 ∗ P (100 < X < 150) = 71.4
E4 = 500 ∗ P (150 < X < 200) = 42.2
E5 = 500 ∗ P (200 < X < 250) = 25.9
E6 = 500 ∗ P (250 < X < 300) = 14.7
E7 = 500 ∗ P (300 < X < 350) = 8.7
E8 = 500 ∗ P (350 < X < 400) = 5.1
E9 = 500∗P (X > 400) = 500−204.6−120.9−71.4−42.2−24.9−14.7−8.7−5.1 =
7.5
Time
050- 100- 150- 200- 250- 300- 350- ¿400
Ok
208
112
40
30
18
11
6
0
Ek
204.6 120.9 71.4 42.2 24.9 14.7 8.7
5.1
7.5
P (Ok −Ek )2
2
Test statistics: x =
= 10.96
Ek
Degrees of freedom ν = 9 − 1 − 1 = 7 (because we have estimated parameter λ)
p = P (χ27 > 10.96)
From Table 8: χ27 (0.2) = 9.803 and χ27 (0.1) = 12.02 so that 0.1 < p < 0.2
No evidence to reject H0 , i.e. the times of the failure follow Exponential distribution.
△
Exercise 7.9
A survey of smoking habits in a sixth form sampled 50 boys and 40 girls at random
35
CHAPTER 7. ESTIMATION AND TESTING
and the frequencies were noted in the following contingency table:
NonLight
Smokers Smokers
Boys
16
20
Girls
24
10
Total
40
30
Heavy
Smokers Total
14
50
6
40
20
90
Is there evidence of differences between the sexes? We are comparing two
distributions (over smoking habits) so the test is one of similarity.
Solution
Table of expected frequencies:
NonLight
Smokers Smokers
Boys
22.2
16.7
Girls
17.8
13.3
Total
40
30
where, for example, 22.2 = 50∗40
P (Ok −Ek )90
2
2
=
Tests statistics: X =
Ek
(10−13.3)2
13.3
(6−8.9)2
8.9
(16−22.2)2
22.2
Heavy
Smokers Total
11.1
50
8.9
40
20
90
+
(20−16.7)2
16.7
+
(14−11.1)2
11.1
+
(24−17.8)2
17.8
+
+
= 7.06
Number of degrees of freedom: ν = (2 − 1)(3 − 1) = 2
p = P (χ22 > 7.06)
From Table 8: χ22 (0.05) = 5.991 and χ22 (0.025) = 7.378 so that 0.025 < p < 0.05.
There is a moderate evidence to reject H0 , i.e. that smoking habit differ between
sexes.
△