Download Engineering Statistics Chapter 4 Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Engineering Statistics
Chapter 4
Hypothesis Testing
4B
Testing on Variance &
Proportion of Variances
Why test variance?
• In real life, things vary. Even under strictest
conditions, results will be different between one
another. This causes variances.
• When we get a sample of data, apart from testing on
the mean to find if it is reasonable, we also test the
variance to see whether certain factor has caused
greater or smaller variation among the data.
• While larger variance may indicate the influence of
some undesirable element, certain factors also bring
the data closer, resulting in smaller variance.
Whether welcome or not, we need to know if these
factors exist.
Differences
• In nature, variation exists. Even identical twins
have some differences. This makes life interesting.
• In production, uniformity is usually desirable.
Farmers try to find ways to make their tomatoes of
the same size, or nearly so, so that they can grade
them and pack them in standard packs.
• Manufacturers thrive on uniformity of their
products. That is the conveyor belt industry’s
prerequisite. It will be a nightmare if the cars
coming out of a production line are different.
Nature against Production
• In real life, however, variation exists. When things
come out uniform, it is a suspicious event.
• Hence if we find that a garden produces fruits of
exactly the same size, then it means that either
there is an interference in the process, for
example, where the fruits are kept in fixed-shaped
containers while growing, or they are artificial.
• Another example of unlikely even product is when
the signature on many copies are exactly
matching. This may be the result of tracing, and so
the signatures are not real.
Purpose of Variance Analysis
• Suppose we have introduced a procedure to make
our product uniform, how can we be sure that the
aim has been achieved: i.e. the variance has
become smaller?
• Conversely, if we find that the scores of students
come out nearly the same, we wonder if there is
copying among them, because normally we expect
them to get different scores.
• These are where tests on variances is done.
Variation of variance
• As we collect sample data, it nearly always yield
data varying from one end of the mean to another.
Such variations are parts of natural distribution. In
general, we expect the variation (I.e. the standard
deviation) to be of a certain range. Both large and
small variances are not normal.
• Hence, in testing on variance, we may test for
large values or small values. These correspond to
right and left-tail tests. Similarly, we may also test
for the variance to be within a range, for the twotail tests.
Modeling variance
• There are two ways by which the variances are
tested: either a sample variance against the
assumed population variance, or variances of two
population.
• As we learn in Ch 3 (3B), sample variance follows
the 2-distribution when compared to the
population variance. Hence, we shall test sample
variance s2 against the population variance 2
using the distribution (n –1)s2/2~ 2.
Left and right-tails
• As there is great difference between the left and
right tail of a 2-distribution, we need to read the
two ends separately.
• As usual, if we are testing on the right tail, we
read the 2-value at . However, for the left tail,
we read the value at 1–.
• If we are comparing the variances of two samples,
the distribution to be used is the F-distribution,
with n1–1, n2–1 degrees of freedom.
Example 1
• The administrative manager of our company feels
that when the files of a customer have very
different sizes, it will mean loss of spaces. To
avoid wasting spaces allocated for common
computer files in an office, he introduces a new
format for all such files. After using the format for
a month, the manager checks on 25 such files, and
find that the standard deviation in the sizes is 3.3
kb. Previously the standard deviation was 5.4 kb.
At 90% level of confidence, test the hypothesis
that the standard deviation has been reduced.
Example 1 (Analysis)
• Since there are 25 data, we model the
variance s2 using a 2-distribution of degree
=25–1=24.
• The model is (25–1)s2/5.42~ 224.
• As we are testing whether the variance
(hence the standard deviation) has gone
down, we run a test on the left-tail of the
distribution.
The procedure
Null hypothesis: 2 = 5.42;
Alternative hypothesis: 2<5.42.
Test statistic: 2 = 24s2/5.42 ~ 224.
At 90% confidence, =0.1, 1–=0.9, 20.9,24=
15.659. Hence the null hypothesis will be accepted
if 2  15.659.
2 = 243.32/5.42 = 8.963 < 2critical. So we reject
the null hypothesis. The file standard deviation has
indeed been reduced.
• As has been discussed
before, the value of 2
is badly skewed. We
will accept the null
hypothesis when 2 is
is more than the
critical value of
15.659. However, the
value obtained is less
than that, so we accept
the alternative
hypothesis.
The graph
Example 2
• The standard deviation of the price of a tray of 30
eggs is 32.5 sen. A check is made on the price in 38
outlets a town and it is found that the standard
deviation is 42.4 sen. At 95% of confidence, test the
hypothesis the prices in that town fluctuate more
than 32.5 sen.
Solution: The distribution we should use is (38–1)s2
/32.52~ 237.
• Unfortunately, the 2 table does not provide for the
37 degree of freedom. We have to take the nearest
value, i.e. 240.
Example 2 (Analysis)
• The test is on whether the prices vary more in that town than
in general. So our test is on the right tail.
Null hypothesis: 2 = 32.52;
Alternative hypothesis: 2>32.52.
Test statistic: 2 = 37s2/32.52 ~ 237.
(Note: Even though we are reading the value of 240, technically
we are still testing using 237.)
At 95% confidence, =0.05, 20.05,37= 55.758. Hence the null
hypothesis will be admitted if 2  55.758.
In this case, the calculated value of 2 is 3742.42/32.52 =
63.315 > 2critical. So we reject the null hypothesis. The
standard deviation of price of eggs in that town is high.
Example 3
• A florist orders for dendrobium orchids with the
specified mean length of 85 cm and standard
deviation of 5.8 cm. When she receives a supply,
the measurement of 22 stalks shows a mean about
85.5 cm, which is acceptable. But the standard
deviation is 6.7 cm, which different from the
specified value. At 95% of confidence, test the
hypothesis the standard deviation is not what is
expected.
Solution: The distribution for test is (22–1)s2/6.72~
221.
Example 3 (Solution)
• Since we want to know if the SD is as expected, we shall
test on two tails.
Null hypothesis: 2 = 6.72;
Alternative hypothesis: 26.72.
Test statistic: 2 = 21s2/5.82 ~ 221.
At 95% confidence, =0.05, /2=0.025, 1-/2 =0.975,
20.025,21= 35.479, and 20.975,21= 10.283. Hence we shall
accept null hypothesis if 10.283 2 35.479.
2 = 216.72/5.82 = 28.023, which is within the range of
critical values. Hence we accept the null hypothesis. This
means the variance is acceptable.
Example 4
Example 5
Ratio of two variance
• When we compare the variances from two
samples, the correct measure to take is the ratio of
the two using the F-distribution. As we saw in 3B,
s12/s22~F1,2, where n1 and n2 are the sizes of the
samples and 1=n1-1, and 2 =n2-1.
• Similar to the test of one sample variance against
the population, we may have one-tail (on the right
or left), or two-tail test.
Example 6
• In order to compare the variation of income
among the workers in two categories, a survey is
conducted among 15 workers from category A and
31 in category C. It turns out that the standard
deviation for A is RM 152.08 and that for C is RM
116.36. At 95% level of confidence, test the
hypothesis the variation in incomes for the two
group are nearly the same.
Solution: We shall use the F-distribution of 14
numerator degree and 30 denominator degree for
the test. i.e. 12/ 22~ F14,30.
Example 6 (Solution)
• We run a two tail test to find if the variance are
different.
Null hypothesis: 12 = 22;
Alternative hypothesis: 12  22.
Test statistic: F = s12/s22~ F14,30.
At 95% confidence, =0.05, /2=0.025, 1-/2
=0.975.
From the table, F0.025,14,30= 2.31, [Actually
F0.025,15,30]. However, for F0.975,14,30, we need to
calculate the value as follows:
Example 6 (contd)
• F0.025,30,14 = 2.73.
• F0.975,14,30 = 1/F0.025,30,14 = 1/2.73 = 0.366.
• Hence we shall accept null hypothesis if 0.366
Fcalculated  2.31.
• Fcalculated = s12/s22
= 152.082/116.362 = 1.708.
This falls within the critical range. Hence we accept
the null hypothesis. The variances (and hence the
standard deviations) of the two categories of
workers are not different.
Example 7
• A medical director claims that the standard
deviation in the time taken for treating patients has
been cut down since he has introduced a new
procedure. The record shows that for 21 patients,
the standard deviation for treatment is 6.8 minutes.
In contrast, the standard deviation for 16 patients
was 9.2 minutes before the procedure was used. At
95% level of confidence, can you accept his
claim?
Solution: Let s1 and s2 represent the standard
deviation pre- and post- procedure.
Example 7 (Solution)
Null hypothesis: 22 =  12;
Alternative hypothesis: 22 < 12.
Test statistic: F = s22/s12~ F20,15.
This is a one-tail test. As we are testing for reduction, it
means we are looking at the left tail. At 95% confidence,
=0.95.
From the table, F0.05,15,20 = 2.20. So F0.95,20,15= 1/2.20 =
0.455. Hence we shall accept the null hypothesis if
Fcalculated  0.455, and reject it otherwise.
Fcalculated = 6.82/9.22 = 0.546 > Fcritical. Hence we accept the
null hypothesis. The change in the standard deviation is not
significant at =0.95.
Example 8
• A technician complains that the standard deviation of the
IC produced since the temperature in the production room
has decreased by 2oC has actually gone up. This affects the
quality of the IC. A measurement shows that the standard
deviation of the 12 new ICs is 32.2 m; while 15 ICs from
the previous batch has standard deviation of 27.4 m. At
95% confidence level, can we support the technician?
Solution: Let s1 and s2 represent the standard deviation before
and after the temperature was reduced.
Null hypothesis: 22 = 12;
Alternative hypothesis: 22 > 12.
Example 8 (Solution)
Test statistic: F = s22/s12~ F11,14.
This is a one-tail test on the right. At 95% confidence,
=0.05. So we shall look for F0.05,11,14.
However, the table does not provide value for F0.05,11,14, so
we read the values for F0.05,10,14 (2.60) and F0.05,12,14 (2.53)
and take the average, giving 2.57.
Hence we shall accept the null hypothesis if Fcalculated  2.57,
and reject it otherwise.
Fcalculated = 32.22/27.42 = 1.381 < Fcritical. So the null
hypothesis is accepted.