Download 9 Tests and Confidence Intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
9
Tests and Confidence Intervals
9.1
9.1.1
Testing Hypotheses
Introduction
A statistical hypothesis is an assertion about some population or probability distribution. We do
not know whether or not the assertion is valid but we may find relevant evidence in the data. Here
are some examples of hypotheses:
• the population mean number of vehicles using a short ferry crossing on summer Saturdays is
500.
• the population mean number of vehicles using a short ferry crossing on summer Saturdays is
more than 500.
• a modification to the design of hulls has no effect on fuel consumption.
• a modification to the design of hulls does have an effect on fuel consumption.
The usual way to test hypotheses is by means of significance tests. In a significance test we
test a hypothesis called the null hypothesis by assessing the evidence in the data against the null
hypothesis and in favour of an alternative hypothesis.
9.1.2
Testing a mean: the normal distribution
Many naturally occuring continuous variables have, at least approximately, normal distributions.
This is, in fact, the origin of the name “normal” distribution. Suppose, for example, we do an
experiment to measure some physical quantity. Suppose that the true value of the physical quantity
is M but this value is, of course, unknown to us. Our measuring instrument gives measurement
errors in such a way that any measurement X has a N (M, 16) distribution. That is, the standard
deviation of the measurement errors is 4. Suppose also that we can repeat the experiment and
the measurement errors in the replications are independent. (This example is a little artificial
because, at this stage, we need to assume that the variance is known whereas, in reality, it is
usually unknown. We will deal with the case of unknown variance in Lecture 10).
Suppose that we will make n = 25 measurements. It can be shown that the sample mean X̄
has a normal distribution with mean M and variance
σ2
16
=
= 0.64.
n
25
So X̄ is an unbiassed estimator of M and its standard error is
r
r
σ2
16
=
= 0.8.
n
25
Now, suppose that we have some theoretical value M0 for M and we want to test the hypothesis
that M = M0 . Then our null hypothesis is
H 0 : M = M0
There are various different alternative hypotheses which we could have. For example we could
have
HA : M < M 0
That is we are testing against the alternative that M is less than M0 .
We need a test statistic. We use
X̄ − M0
.
Z= p
σ 2 /n
1
When the null hypothesis is true Z ∼ N (0, 1). The N (0, 1) distribution is called the standard
normal distribution.
We need a critical region (or rejection region). This is the set of values of Z which will lead us
to reject the null hypothesis in favour of the alternative. In this example the critical region will
have the form Z < k (since small values of Z would be expected under the alternative hypothesis).
The value of k depends on the significance level (or size) of the test. This is the probability that
we would reject the null hypothesis if it were true. The usual values used for significance levels
are 0.05 (5%), 0.01 (1%) and 0.001 (0.1%). Thus the null hypothesis is rejected if a value of the
test statistic occurs which would be unusual under the null hypothesis but more likely under the
alternative.
It can be shown that, if Z ∼ N (0, 1), then
Pr(Z < −1.6449) = 0.05
Pr(Z < −2.3263) = 0.01
Pr(Z < −3.0902)
if
if
if
if
Z
Z
Z
Z
> −1.6449
< −1.6449
< −2.3263
< −3.0902
we
we
we
we
=
0.001
do not reject H0 ,
reject H0 at the 5% level,
reject H0 at the 1% level and
reject H0 at the 0.1% level.
If we reject H0 then we conclude that the evidence suggests that M < M0 .
Note that, if we reject H0 at the 5% level we say that the result is “significant at the 5%
level” and so on. Sometimes “significant at the 5% level” is called “significant”, “significant at the
1% level” is called “highly significant” and “significant at the 0.1% level” is called “very highly
significant.”
Figure 1 shows the rejection region at the 5% level.
9.1.3
Two-sided tests
The alternative hypothesis above, M < M0 is called one-sided. The test is one-sided since we
reject H0 only when Z < k. We often test against two-sided alternatives, e.g. M 6= M0 and reject if
Z < k1 or Z > K2 . For example, in a 5% test we reject if Z < −1.96 or Z > 1.96. This is because,
under H0 , Pr(Z < −1.96) = Pr(Z > 1.96) = 0.025. So the total probability of rejection is 0.05, as
required. The rejection region is shown in figure 2.
9.1.4
Testing a proportion: Example
It is suggested that most components of a certain type will work for 2000 hours without failing.
Let p be the probability that a component of this type will fail before completing 2000 hours of
work. Suppose we intend to test 100 components in order to test the null hypothesis that p = 0.5
against the alternative that p > 0.5.
Null hypothesis
Alternative hypothesis
H0 :
HA :
p = 0.5
p > 0.5
We need a test statistic. Let us use the number of components which fail before 2000 hours.
Call this T.
We need a critical region (or rejection region). This is the set of values of T which will lead us
to reject the null hypothesis in favour of the alternative. In this example the critical region will
have the form T > k (since large values of T would be expected under the alternative hypothesis).
The value of k depends on the significance level (or size) of the test. This is the probability that
we would reject the null hypothesis if it were true. The usual values used for significance levels
are 0.05 (5%), 0.01 (1%) and 0.001 (0.1%). Thus the null hypothesis is rejected if a value of the
test statistic occurs which would be unusual under the null hypothesis but more likely under the
alternative.
2
0.4
0.3
0.2
0.0
0.1
density
−4
−2
0
2
4
Z
0.2
0.1
0.0
density
0.3
0.4
Figure 1: Rejection region, one-sided, 5%.
−4
−2
0
2
Z
Figure 2: Rejection region, two-sided, 5%.
3
4
In this example, under H0 , the distribution of the test statistic is T ∼Bin(100, 0.5). When we
have a normal distribution with large n and with both np and n(1 − p) not too close to zero then
we can use the normal distribution to calculate the binomial probabilities approximately. We use
a normal distribution with the same mean and variance as the required binomial distribution. So
we use µ = np and σ 2 = np(1 − p). In this case the distribution of T may be approximated as
N (50, 25). We can show that
if
if
if
if
T
T
T
T
≤ 58
> 58
> 61
> 65
we
we
we
we
do not reject H0 ,
reject H0 at the 5% level,
reject H0 at the 1% level and
reject H0 at the 0.1% level.
If we reject H0 then we conclude that the evidence suggests that p > 0.5.
Note that, if we reject H0 at the 5% level we say that the result is “significant at the 5%
level” and so on. Sometimes “significant at the 5% level” is called “significant”, “significant at the
1% level” is called “highly significant” and “significant at the 0.1% level” is called “very highly
significant.”
9.1.5
Power
The probability that H0 is rejected when it is true is the significance level. The probability that
H0 is rejected when HA is true is the power of the test. This usually depends on the value of a
parameter.
9.1.6
Example continued
Consider the test at the 5% level. The distribution of the test statistic is T ∼Bin(100, p) or,
approximately, N (100p, 100p[1 − p]). We reject H0 if T > 58. We can calculate the power of this
test for any given value of p.
We can increase the power by increasing the sample size, n. If n is small then fairly large values
of p are likely to fail to lead to rejection of H0 . If n is very large then H0 might well be rejected
as a result of a trivial difference between p and 0.5. In designing the experiment we must choose
the value of n bearing in mind the cost of testing components and the size of deviation from 0.5
which we wanted to reasonably sure we will detect.
See figure 3.
9.1.7
Two-sided tests
The alternative hypothesis above, p > 0.5, is called one-sided. The test is one-sided since we reject
H0 only when T > k. We often test against two-sided alternatives, e.g. p 6= 0.5 and reject if T > k2
or T < k1 . For example, in a 5% test as above with n = 100 and H0 : p = 0.5 and HA : p 6= 0.5, we
reject H0 if T < 41 or if T > 59. Note that, when H0 is true, Pr(T < 41) = Pr(T > 59) = 0.025
giving a total of 0.05.
9.2
9.2.1
Interval Estimation (Confidence Intervals)
Introduction
A point estimate of an unknown parameter gives us a single value calculated in such a way that it
is “likely” to be “close” to the true value. We can use a hypothesis test to test the hypothesis that
a parameter takes a particular value. It is also useful to be able to give a range of values, around
our point estimate, to indicate just how close the estimate is “likely” to be to the true value and
just how “confident” we can be in this.
I have put quotation marks around some of the words here because I used them rather loosely.
When dealing with confidence intervals it is necessary to choose words carefully. If we were willing
to take the approach of adjusting our beliefs, expressed as a probability distribution, about an
unknown quantity when we observe data, then we could simply give the probability that the true
4
Figure 3: Power of Test
value of the quantity was inside a given range. However many people are apparently unwilling to
do this so the approach of confidence intervals has been developed.
Suppose we want to estimate some quantity, say µ, the mean mark of students on a standard
test, and we are going to do this by collecting a sample of data. Suppose we define two statistics
L and U which we will calculate from these data (in the same way that the sample mean, X̄, is a
statistic calculated from data). Suppose we know that, if we repeated this experiment many times
then 95% of the time µ would be between L and U, i.e. L < µ < U. That is, before we collect the
data, we can say that
Pr(L < µ < U ) = 0.95.
Then, once we have observed the data and calculated L = l and U = u, we can say that we are
95% confident that
l<µ<u
and this interval is a 95% confidence interval. In the same sort of way we can have, for example,
90% or 99% confidence intervals. A 99% confidence interval from the same experiment would be
wider than a 95% confidence interval but a 90% confidence interval would be narrower.
9.2.2
Example 1
There are several ways to find confidence intervals but we can illustrate one of them by using the
example in 9.1.4. Suppose we are going to observe 100 components and observe T, the number
failing before 2000 hours. Suppose the true value of p is p? . We know that the probability of
rejecting the null hypothesis H0 : p = p? in a test at the 5% level is 0.05. The probability of not
rejecting H0 must therefore be 0.95. Now suppose we test all possible values of p in this way. That
is, for every value x we test the null hypothesis that p = x. Suppose we then collect together all
of the values which were not rejected. Since there is a 95% chance that p? will not be rejected,
there must be a 95% chance that p? will be in the interval formed in this way. Therefore it is
a 95% confidence interval. (Actually, for technical reasons, it will only be approximately a 95%
confidence interval).
5
9.2.3
Example 2: Estimating the mean of a normal distribution when the variance
is known
When our data are measurements, such as fuel consumptions or times to complete a voyage, we
often use the normal distribution as a model. Sometimes we transform the data first so that the
normal distribution is a better description. For example, with the voyage times we might use the
logarithms of the times. The normal distribution with mean µ and variance σ 2 is written N (µ, σ 2 ).
Usually we would not know the value of either µ or σ 2 but the analysis is slightly simpler if we
know the value of σ 2 so, just for now, we assume that we do.
Let X1 , . . . , Xn be i.i.d. observations from a N (µ, σ 2 ) distribution where σ 2 is known. Then,
when the data have been observed, if the observed value of the sample mean X̄ is x̄, we say that
we are 95% confident that
−1.96 <
x̄ − µ
√ < 1.96.
σ/ n
That is
√
√
x̄ − 1.96σ/ n < µ < x̄ + 1.96σ/ n.
This is a 95% confidence interval for µ.
We can have other confidence levels, e.g. 99%, by replacing 1.96 with appropriate constants
found by inspection of tables of the standard normal distribution function.
We can also have one-sided confidence intervals, e.g. N (µ, σ 2 ) with σ 2 known. For example
√
µ > x̄ − 1.6449σ/ n
is a one-sided 95% confidence interval for µ.
9.2.4
Sample size determination for point estimation: Example
Suppose we plan to take n i.i.d. observations X1 , . . . , Xn on the N (µ, 4) distribution. What is the
probability that X̄ is within ±1 of µ?
We can show that the probability, for different values of n, is as follows.
n
1
2
3
4
5
6
7
8
Pr(µ − 1 < X̄ < µ + 1)
0.3829
0.5205
0.6135
0.6827
0.7364
0.7793
0.8141
0.8427
n
9
10
11
12
13
14
15
16
Pr(µ − 1 < X̄ < µ + 1)
0.8664
0.8862
0.9027
0.9167
0.9286
0.9386
0.9472
0.9545
What is the smallest sample size which gives Pr(µ − 1 < X̄ < µ + 1) ≥ 0.95 when X ∼ N (µ, 4)?
The smallest sample size which gives Pr(µ − 1 < X̄ < µ + 1) ≥ 0.95 when X ∼ N (µ, 4) is
n = 16.
9.3
Problems
1. The minimum diameters, in mm, of 50 roller bearings are measured. The measurements
are given below. Assume that, because the sample size is fairly large, we may use a normal
distribution with standard deviation given by the sample standard deviation as a reasonable
approximation. Give a symmetric 99% confidence interval for the population mean minimum
diameter.
Do the data suggest that the population mean is not 30mm?
6
30.05
30.06
29.99
30.05
30.09
30.06
30.14
30.06
29.98
30.07
30.04
30.05
30.05
30.00
30.00
30.06
30.08
30.10
30.09
30.16
30.02
30.10
30.10
30.13
30.02
30.05
30.05
30.00
30.08
30.02
30.09
30.06
30.07
30.11
30.01
30.00
30.01
30.01
30.09
30.15
30.11
30.06
30.01
30.01
30.07
30.09
30.08
30.07
30.04
30.05
2. A factory makes outboard motors. In a sample of 50 motors, 22 are found to need carburettor
adjustments. Suppose that the population proportion requiring adjustment is θ. Test the
null hypothesis that θ = 0.4 against the alternative that θ > 0.4 and give a symmetric 95%
confidence interval for θ.
Hint: To find the confidence interval, if we ignore the continuity correction, solve the equations
x − nθ
p
and
nθ(1 − θ)
x − nθ
p
nθ(1 − θ)
= 1.96
= −1.96
for θ when x = 22 and n = 50. We can make a slight modification to allow for the continuity
correction.
7