Download Engineering Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Engineering Statistics
Chapter 3
Distribution of Samples
Distribution of sample statistics
3B - Variance
Sample Variance
• It is possible, in fact easy to estimate the mean of a
sample because it make sense to expect the sample
mean to be close to the population mean. This is
true even for small samples.
• However, when it comes to the variance, it is
found that the value changes quite a lot, depending
on the size of the sample.
• It has been found that, if the size of a sample is n,
then the ratio variance of sample variance s2 to
population variance 2 follows the 2-distribution
of degree n–1. I.e. (n–1)s2/2 ~ 2n-1.
Introducing 2
• Statisticians found that the value of variance in a
sample follows a certain distribution, called 2distribution.
• This distribution is highly skewed to the right. Its
value depends very much on n, the size of the
sample.
• It is very seldom that we are interested in the
probability of a sample variance being of a certain
value. Rather, we are usually more interested in,
say, the lower and upper limit of the variance.
The Graph of 2
2 tables
• The UTM 2 tables list the values of 2 at
0.001, 0.005, 0.010, 0.025, 0.05, 0.10, 0.25,
0.5, 0.70, 0.90, 0.95, 0.975, 0.99, 0.995 for
n = 1, 2, …,120.
• Unlike the t-distributions, 2 distributions
are highly skewed. The table gives the
probabilities on the left and right ends of
distributions.
2 table: P(2  k) = 
Case  < 0.5. Value of k
large.
Case  > 0.5. Value of k
small.
Interpreting 2 values

0.001 0.005 0.010 0.025 … 0.975 0.990 0.995
= 1 10.827 7.879 6.635 5.024 … 0.001 0.000 0.000
3 16.266 12.838 11.345 9.348 … 0.216 0.115 0.072
6 22.457 18.548 16.812 14.449 … 1.237 0.872 0.676
Example: The values of 2 are read separately for  near 0 and
 near 1.
= 1, P(2 > 7.879) = 0.005;
P(2 > 0.000) = 0.995.
= 3, P(2 > 12.838) = 0.005;
P(2 > 0.072) = 0.995.
= 6, P(2 > 16.812) = 0.010;
P(2 > 0.872) = 0.990.
Example 1
The standard deviation of the life of a certain car
battery is 5.8 months. A repair shop just received a
sample of 7 batteries. Find the 95% confidence
interval of the standard deviation of the sample.
Solution: (7–1)s2/2 ~ 26.
At 95% confidence, /2 = 0.025 and 1– /2 = 0.975.
From the table, we read 20.025,6 = 14.449 and 20.975,6 =
1.237.
So 1.237 6s2/2 14.449  2.63s 9.00.
Example 2
A pharmaceutical company claims that the standard
deviation of its 200 mg Vitamin C tablets is 24 mg
or less. If we check 21 such tablets, what is the
90% confidence interval of the standard deviation?
Solution: (21-1)s2/2 ~ 220.
At 90% confidence, /2 = 0.05 and 1–/2 = 0.95.
From the table, we read 20.05,20 = 32.671 and 20.95,20
= 11.591.
So 11.591  20s2/2  32.671  18.27 mg  s 
30.67 mg.
Example 3
•
The annual report of HHH Restaurant
shows the mean sale of its franchises for
the last quarter is RM 5.6 m, with
standard deviation of RM 1.25 m.
Estimate the 95% confidence intervals for
the
(i) mean, and
(ii) standard deviation
for 20 restaurants managed by Ali & Co.
Example 3 (i) Solution
• Since the mean and standard deviation from the
population are given, so we shall use the normal
distribution
X~N(5.6, 1.252/20)
to model the sample mean. At 95% confidence,
/2 = 0.025. Z0.025 = 1.96. So the interval for the
sample mean is 5.6 – 1.96×[1.252/20]  X  5.6
+ 1.96×[1.252/20]
 5.05  X  6.15
The range is from RM 5.05 m to RM 6.15 m.
Example 3 (ii) Solution
• We model the variance using (n–1)s2/2 ~ 219.
At 95% confidence, /2 = 0.025 and 1– /2 =
0.975. From the 2-table, we have 20.025,19 =
32.852 and 20.975,19 = 8.907.
So the inequality is 8.907  19×s2/1.252  32.852
0.7325  s2  2.7016
0.856  s  1.644.
The range for the standard deviation is RM 0.856
m to RM 1.644 m.
Population variance from Sample
variance
• As for the mean, we usually need to estimate the
variance of the population from a sample. In this
case, we use the same 2n–1 distribution for (n–1)
s2/2.
• The calculation of 2 can be obtained directly
from the inequality
2 /2,n–1  (n–1)s2/2  21–/2,n–1;
or we can use the inverse inequality
1/2 1–/2,n–1  2/(n–1)s2 1/2/2,n–1.
The result is the same.
Example 4 – Using
2
(n–1)s /2
From a sample of 10 food samples, it was found that
the mean content of a certain poison is 13.5 g
with standard deviation 3.7 g. At 90% level, find
the confidence intervals of the mean and SD of the
poison content for the food.
Solution: For mean, we shall use the t-distribution
with 9 (=10–1) degrees of freedom since the
sample size is small.
Mean: At 90% level,  = 0.1, /2 = 0.05. Referring to
Table 7, t0.05,9 = 1.833. Hence the mean should lie
between 13.5–1.833×3.7/10 to 13.5+1.833×3.7/10.
So we conclude that the mean is between 11.356 g
and 15.64 g .
Variance: Using the standard symbols, we have (n–
1)s2/2~ 29. At 90% level, /2 = 0.05 and 1–/2 =
0.95. From the table, we find 20.05,9 = 16.919 and
20.95,9 = 3.325.
So 3.325  9×3.72/2  16.919. From this, we obtain
2.70 g    6.09 g.
Example 4 – Using
 2/
2
(n-1)s
• Instead of using the distribution for (n–1)s2/ 2,
we can use the form 2 /(n–1)s2 ~ 1/29.
This will then give us at 90% level, /2 = 0.05 and
1–/2 = 0.95.
The relation for the interval is 1/20.05,9  2 /(n-1)s2 
1/2
0.95,9 .
From the table, we have 1/3.325.  2 /9×3.72 
1/16.919.
From the inequality, we obtain the same range
2.70g    6.09 g.
Example 5
• Consumers complain that the price of food
vary a lot depending on where you live. In
order to ascertain the variation of the price
of a plate of fried rice, a survey is made at
75 stalls across the country. The standard
deviation turns out to be 68 sen. Based on
this survey, estimate the standard deviation
of the price of a plate of fried rice for the
whole country at the level of 95%.
Example 5 (Solution)
• We model the standard deviation using the
274 distribution: (n–1)s2/2~ 274. However, the
table does not provide for 274. The alternative is
to use the nearest value, which is 275.
• At 95%, /2 = 0.025 and 1–/2 = 0.975.
From the table, we read 20.025,75 = and 20.975,75 =
• Note: When  is large, there is little difference
between 2 of one value from another. Using 275
instead of 274 will not cause any discrepancies.
Example 6
• In a health screening, 14 student have their
weights and heights taken. Thee BMI are
calculated as follows:
32
25
26
22
18
28
35
25
18
22
29
33
20
26
Based on this set of data, find the 90% confidence
interval for the mean and standard deviation of the
BMI for all students.
Note: In this case, the raw sample data are given. We
are to infer on the population parameters based on
the sample data.
Example 6 (Solution)
We first find the mean and SD using the calculator as
follows:
First put your calculator in SD mode. Enter each
number using the M+ (called DATA) key.
After that, Tap SHIFT, 2. You see three displays: X,
Xn and Xn–1. The two SD are called
population and sample SD respectively. Since you
obtain the data from the sample, you need to
choose the sample SD. Thus:
mean:
X = 25.64,
standard deviation:
s = 5.37.
Example 6 (Solution)
I. Confidence interval for the mean:
The sample size 14 is small; so we model the
population mean µ using the t-distribution of
degree 13.
At 90% confidence, α=0.1, α/2=0.05. t0.05,13 = 1.771.
So the confidence interval for the mean is
25.64 – 1.771×(5.372/14) to 25.64 +
1.771×(5.372/14)
 23.10 to 28.18.
Example 6 (contd)
II. Confidence interval for the standard deviation
(using the variance)
For the variance, the model is (n–1)s2/ 2 ~ 2.
which in this case is 13×5.372/2 ~ 213.
At 90% confidence, α=0.1, α/2=0.05, and 1 – α/2 =
0.95.
20.05,13 = 22.362, 20.95,13 = 5.892.
This means that 5.892 ≤ 13×5.372/2 ≤ 22.362
 1.767 ≤  ≤ 7.977.
Ratio of two variances
• When variances s12 and s22 are obtained
from two samples, either of the same
population, or from two comparable
populations, then the ratio of the variances
s12/s22 follows the F-distribution of degrees
1 and 2 degrees:
s12/ s22~F1,2
F-distribution
• F-distribution has two parameters, 1 called the
numerator and 2 the denominator.
• Because the F-distributions are very wildly
skewed, depending on the degrees of freedom,
table are given only for the right tail of  = 0.001,
0.01, 0.025, and 0.05.
• We need to determine F values for 0.999, 0.099,
0.975 and 0.95 ourselves, using the fact that if
s12/s22 ~ F1,2, then s22/s12 ~ F2,1.
Reading the F-distribution table
• To find the value of F0.05,6,7, say, you first
look for the 0.05 table. Next you read the
top row. This shows the numerator values.
Locate 5.
• On the left, the first column shows the
denominators. Locate 7. On this row, under
the numerator 5, we see 3.97. So F0.05,6,7
=3.97.
Obtaining F-value of 1–
• We note thats12/ s22~F1,2  s22/s12~F2,1.
• From this relation, we obtain F 1,2,1- as
1/F2,1,.
• For example F0.01,5,7 = 10.46, so F0.99,7,5 =
1/10.46 = 0.0956.
• Conversely, if you need F0.95,8,4, then read
F0.05,4,8 = 6.04.  F0.95,8,4 = 1/6.04 = 0.166.
Example 7
(i) The standard deviation of sugar levels among
men is 1.23 units. Find the 95% confidence
interval of the standard deviation for the sugar
levels for a sample of 24 men.
(ii) The standard deviation of sugar levels among a
sample of 16 men is 1.23 units. Find the 95%
confidence interval of the standard deviation for
the sugar levels for another sample of 26 men.
7 (i) Solution
• This is a revision example of the 2distribution. We note that the population
variance 2 is 1.232. By theory, (24–1)s2/2
~ 223.
• At 95% confidence, α=0.05, α/2=0.025, and 1–
α/2 = 0.975. 20.025,23 = 39.364, and 20.975,23 =
12.401. So 12.401  23s2/1.232  39.364.
 0.903  s  2.589.
7 (ii) Solution
• Here 1.232 is the first sample variance s12. By
theory, s12/s22 ~ F15,25.
• At 95% confidence, α=0.05, α/2=0.025, and 1–α/2
= 0.975.
• F0.025,15,25 = 2.41;
• To calculate F0.975,15,25, we first read F0.025,25,15 =
2.69. Hence F0.975,15,25 = 1/2.69 = 0.372.
So 0.372  1.232/s22  2.41.
 0.792  s2  2.017.
Example 8
(i) The standard deviation for the monthly pays of a
group of 10 workers in a factory is RM 115.65.
What is the 90% confidence of the standard
deviation of the monthly pays of 8 workers in a
similar factory?
(ii) The Tourism Council finds the standard deviation
of the spending among 15 tourists at a resort to be
RM 223.45. Find the 98% confidence interval for
the standard deviation of spending among 20
tourists in a similar resort.
8 (i) Solution
• We take 115.652 as the second sample variance s22,
and we seek s1. By theory, s12/s22 ~ F7,9.
• At 90% confidence, α=0.10, α/2=0.05, and 1 – α/2
= 0.95.
• F0.05,7,9 = 3.29;
• For F0.95,7,9 we read F0.05,9,7 = 3.68  F0.95,7,9 =
1/3.68 = 0.272.
• So 0.272  s12/115.652  3.29
 60.34  s1  209.84.
8 (ii) Solution
• Again we take 223.452 as the second sample variance s22, and
we seek s1. By theory, s12/s22 ~ F19,14.
• At 98% confidence, α=0.02, α/2=0.01, and 1 – α/2 = 0.99.
• Unfortunately, the F-table does not give values for F19,14, and
neither do we have F14,19.
• In this case, we take the nearest value, i.e. F0.01,20,15 = 3.37, and
for F0.99,19, 14, we read F0.01,15,20 = 3.09,  F0.99,19,14 = 1/3.09 =
0.3236.
• Hence 0.3236  s12/115.652  3.37.
 65.79  s1  212.31.
At 98% confidence, the range of the standard deviation is RM
65.79 to RM 212.31.
Wide range for Variance
• We note that, unlike the mean, the confidence
intervals for the variance is rather wide. Increasing
the size of sample does not significantly reduce the
range of the variance. This is the nature of things in
that while variations in values cancel each other,
leading to the mean closer to the expected value, the
variation in values will remain, thus causing large
variances.
• In fact, when we have an unexpectedly small range of
variance, we should suspect that some unusual factors
have caused values to converge. This means the data
are not natural and are suspect.