Download IE256-FundamentalsofSamplingDistributions

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
IŞIKIE
IE 256- Engineering Statistics
Fundamental Sampling Distributions and Data
Descriptions
IE256 Engineering Statistics – Summer 2010
Sampling 1
Graphical Methods and Data Description
IŞIKIE
Summarizing or characterizing the nature of collections of data is very important.
Display of data can enhance statistical inference about scientific systems.
Stem-and-Leaf Plot
Consider the data on the “life” of 40 similar
Car Battery Life
car batteries recorded to the nearest tenth 2.2 4.1 3.5 4.5 3.2 3.7 3.0 2.6
of a year given in the following table. The
3.4 1.6 3.1 3.3 3.8 3.1 4.7 3.7
2.5 4.3 3.4 3.6 2.9 3.3 3.9 3.1
batteries are guaranteed to last 3 years.
3.3 3.1 3.7 4.4 3.2 4.1 1.9 3.4
First, we split each observation into two
4.7 3.8 3.2 2.6 3.9 3.0 4.2 3.5
parts consisting of a stem and a leaf such
that the stem represents the digit preceding
the decimal and the leaf corresponding to the decimal part of the number. For
example for the number 3.7 the digit 3 is the stem and the digit 7 is the leaf. This
data can be represented using four stems, 1, 2, 3, and 4. The leaves will be listed next
to each stem.
IE256 Engineering Statistics – Summer 2010
Sampling 2
IŞIKIE
Graphical Methods and Data Description
Stem-and-Leaf Plot of Battery Life
Stem
1
2
3
4
Leaf
69
25669
00111122233344445567778899
11234577
Frequency
2
5
25
8
The stem-and-leaf plot above contains only four stems and consequently does not
provide an adequate picture of the distribution. To remedy this we need to increase
the number of stems in our plot. One simple way to accomplish this is to divide
further each stem into two and then record the leaves 0, 1, 2, 3, and 4 opposite to
the first stem and the leaves 5, 6, 7, 8, and 9 opposite to the second stem.
Stem-and-Leaf Plot of Battery Life
Stem
1•
2*
2•
3*
3•
4*
4•
Leaf
69
2
5669
0011112223334444
5567778899
11234
577
IE256 Engineering Statistics – Summer 2010
Frequency
2
1
4
15
10
5
3
Sampling 3
Graphical Methods and Data Description
IŞIKIE
Frequency Histogram
The use of frequency distribution is another effective way to summarize data.
The data are grouped into different classes or intervals and the number of data
points belonging to each interval reveals the frequency distribution.
Dividing each class frequency by the total number of observations, we obtain the
proportion of the set of observations in each of the classes.
The information provided by the relative frequency distribution in tabular form is
easier to grasp if presented graphically.
IE256 Engineering Statistics – Summer 2010
Sampling 4
IŞIKIE
Graphical Methods and Data Description
Relative Frequency Distribution of Battery Life
Relative Frequency
Class
Interval
1.5-1.9
2.0-2.4
2.5-2.9
3.0-3.4
3.5-3.9
4.0-4.4
4.5-4.9
Class
Midpoint
1.7
2.2
2.7
3.2
3.7
4.2
4.7
Frequency
f
2
1
4
15
10
5
3
Relative
Frequency
0.050
0.025
0.100
0.375
0.250
0.125
0.075
0.375
0.250
0.125
0
1.7
2.2
IE256 Engineering Statistics – Summer 2010
2.7
3.2
3.7
Battery Life
4.2 4.7
Sampling 5
IŞIKIE
Graphical Methods and Data Description
Skewness of Data
Distributions can be classified according to their tails.
A distribution with equal tails is said to be symmetric whereas
a longer and/or heavier left tail is said to be skewed to the left and
a longer and/or heavier right tail is said to be skewed to the right.
A frequency histogram may be classified based on skewness as well.
skewed to the left
IE256 Engineering Statistics – Summer 2010
symmetric
skewed to the right
Introduction 6
Graphical Methods and Data Description
IŞIKIE
While the median divides the data into two, the lower and upper halves of the data,
we can divide the data into further parts.
Quartiles divide the data into four parts. The third quartile separates the upper
quarter (25%) of the data from the rest of the data, the second quartile is the
median, and the first quartile separates the lower quarter from the upper 75% of
the data.
For example consider a data set with 40 observations (n = 40).
One quarter of the data equals 10 observations. First quartile needs to separate the
data into the lower 10 observations and the upper 30 observations, therefore the
mid point of x(10) and x(11) constitutes first quartile.
x ( 10)  x ( 11)
First
Quartile 
2
Second  x (20)  x (21)
Quartile
2
(median)
IE256 Engineering Statistics – Spring 2011
x ( 30)  x ( 31)
Third
Quartile 
2
Introduction 7
Graphical Methods and Data Description
IŞIKIE
The data can even be further divided by computing percentiles of the distribution.
Percentiles divide the data into hundred parts. For example the 95th percentile
separates the highest 5% from the bottom 95% while the 10th percentile separates
the bottom 10% from the top 90%.
Consider a data set with 40 observations (n = 40). 5% of the the data consists of 2
observations.
95th percentile = midpoint between x(38) and x(39)
10th percentile = midpoint between x(4) and x(5)
IE256 Engineering Statistics – Summer 2010
Introduction 8
IŞIKIE
Graphical Methods and Data Description
Box-and-Whisker Plot
Box-and-whisker plot encloses the interquartile range of the data in a box that has
the median displayed within. The interquartile range is between the 75th percentile
(upper quartile) and 25th percentile (lower quartile). In addition to the box,
“whiskers” extend, showing the largest and smallest observations. Consider the
nicotine data (Exercise 1.21) given below:
1.09
0.85
1.86
1.82
1.40
0.50
1.92
1.24
1.90
1.79
1.64
2.31
1.58
1.68
2.46
2.09
1.79
2.03
1.51
1.88
1.75
2.28
1.70
1.64
2.08
1.63
1.00
IE256 Engineering Statistics – Summer 2010
1.74
2.17
0.72
1.67
2.37
1.50
1.47
2.55
1.69
1.37
1.75
1.97
2.11
1.85
1.93
1.69
2.00
min  0.72
q 1  1.64
median  1.77
q3  2.00
max  2.55
2.50
Sampling 9
IŞIKIE
Graphical Methods and Data Description
0.72
0.50
1.64 1.77
1.00
1.50
2.00
2.00
2.55
2.50
A variation called box plot can provide the viewer information regarding which
observations may be outliers.
One common procedure to detect outliers is to use a multiple of the interquartile
range. For example, if the distance from the box exceeds 1.5 times the interquartile
range (in either direction), the observation may be labeled as an outlier.
0.50
1.00
IE256 Engineering Statistics – Summer 2010
1.50
2.00
2.50
Introduction 10
Populations and Samples
IŞIKIE
Definition 8.1. A population consists of the totality of the observations with which
we are concerned.
Populations can be infinite or they might have a finite size.
We call a population “population f(x)” if each observation in a population is a value
of a random variable X having probability distribution f(x). Ex: Normal Population,
Binomial Population.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 11
Populations and Samples
IŞIKIE
The statistician is interested in arriving at conclusions concerning a population
when it is impossible to observe the entire population.
Therefore, we must depend on a subset of observations, which brings us to the
notion of sampling.
Definition 8.2. A sample is a subset of a population
To eliminate any bias in the sampling procedure, it is desirable to choose a random
sample.
Let Xi, i=1,2,…,n, represent the ith sample value we observe from a population f(x).
The random variables Xi will constitute a random sample with values x1, x2, …, xn if
the measurements are obtained by repeating the experiment n independent times
under the same conditions.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 12
IŞIKIE
Populations and Samples
Definition 8.3. Let X1, X2, …, Xn be n independent random variables, each having
the same probability distribution f(x). We define X1, X2, …, Xn to be a random
sample of size n from the population f(x) and write its joint distribution as
f(x1,x2,…,xn)=f(x1)f(x2)…f(xn)
Normal
Population
µ
X1
X5
IE256 Engineering Statistics – Spring 2011
X3 X 2 X 4
Sampling Distributions 13
Some Important Statistics
IŞIKIE
Definition 8.4. Any function of the random variables constituting a random sample
is called a statistic.
In this course we are interested in the following major statistics
-Sample mean
-Sample variance
Definition 8.5. If X1, X2,…, Xn represent a random sample of size n, then the sample
mean is defined by the statistic
1 n
X   Xi
n i 1
Definition 8.6. If X1, X2,…, Xn represent a random sample of size n, then the sample
variance is defined by the statistic
1 n
2


S 
X

X
 i
n  1 i 1
2
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 14
IŞIKIE
Some Important Statistics
Definition. The sample mode is the element that is observed most often in the
sample.
Definition. Let X(i) be the i-th value when the sample data is sorted in increasing
order. Then, the sample median is defined as follows

X( n1) 2

~ 
X
 1 ( X( n 2)  X( n 21) )

2
IE256 Engineering Statistics – Spring 2011
if n is odd,
if n is even,
Sampling Distributions 15
IŞIKIE
Some Important Statistics
Example: Consider the following measurements, in liters for two samples of orange
juice bottled by companies A and B:
Sample A
0.97
1.00
0.94
1.03
1.06
Sample B
1.06
1.01
0.88
0.91
1.14
-Compute the sample means, the sample medians and the sample variances.
Definition 8.7. The sample standard deviation, denoted by S, is the positive square
root of the sample variance.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 16
IŞIKIE
Sampling Distributions
A statistic is a random variable that depends only on the observed sample, hence it
must have a probability distribution.
Definition 8.10. The probability distribution of a statistic is called a sampling
distribution.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 17
Sampling Distribution of Means
IŞIKIE
Suppose that a random sample of n observations are taken from a normal
population with mean m and variance s 2. Each observation Xi, i =1,2, ...,n of the
random sample will then have the same normal distribution as the population being
sampled. By the reproductive property of the normal distribution (see Theorem
7.11), we conclude that:
1
X1  X2  ...  X n 
n
has normal distribution with mean
1
E X   m X  m  m  ...  m   m
n
n terms
and variance
X
1 2
s2
2
2
Var ( X )  s  2 s  s  ...  s  
n
n
2
X
n terms
If we are sampling from a population with unknown distribution, either finite or
infinite, distribution of X will still be approximately normal with mean m and s 2/ n
provided that the sample size is large. This AMAZING!!! result is an immediate
consequence of the following theorem, called the central limit theorem.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 18
Sampling Distribution of Means
IŞIKIE
Theorem 8.2. Central Limit Theorem: If X is the mean of a random sample of size n
taken from a population with mean m and finite variance s 2, then the limiting form
of the distribution of
Xm
s n
as n   , is the standard normal distribution, n(z;0, 1).
Z
The normal approximation for X will generally be good if n ≥ 30.
If n < 30 the approximation is good only if the population is not too different from
a normal distribution.
If the population is known to be normal, the sampling distribution of X will follow
a normal distribution exactly, no matter how small the size of the sample.
The sample size n = 30 is a guideline to use for the central limit theorem. However,
as the statement of the theorem implies, the presumption of normality on the
distribution of X becomes more accurate as n grows larger. (Figure 8.7 p246)
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 19
Sampling Distribution of Means
IŞIKIE
Example 8.6. An electrical firm manufactures light bulbs that have a length of life
that is approximately normally distributed, with mean equal to 800 hours and a
standard deviation of 40 hours. Find the probability that a random sample of 16
bulbs will have an average life of less than 775 hours.
Note: If we were to repeat this experiment (sample 16 bulbs and compute the
sample mean) many times, what proportion of the sample averages will be less
than 775 hours? How rare an event that X  775 is when m  800 . These are a
questions regarding the distribution of the sample mean.
Note: the probability of a single light bulb being less than 775 hours is different than
the probability of average of 16 light bulbs being less than 775 hours
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 20
IŞIKIE
Sampling Distribution of Means
distribution of X m  m X  800
P( X  775)
s  s X  40
40
775 800
distribution of X m X  800
sX 
40
 10
16
P( X  775)
10
775 800
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 21
IŞIKIE
Sampling Distribution of Means
X  m 775  m
P( X  775)  P

s
 s
 0.2660
  P Z  775  800   PZ  0.625

 
40
 

 Xm
775  m
P( X  775)  P

s n
s n
 0.0062
 
775  800 
  P Z 
  PZ  2.5
40 16 
 
Only 62 out of 10000 samples (experiments) will result with a sample average less
than or equal to 775 hours if in fact the population mean is 800 hours and the
population standard deviation is 40 hours.
On the other hand, if m, the population mean, truly were 785 hours X  775 would
not be a rare event.

775  785 
  PZ  1.0  0.1587
P( X  775 | m  785)  P Z 
40 16 

IE256 Engineering Statistics – Summer 2010
Sampling Distributions 22
IŞIKIE
Inferences on the population mean
One very important application of the central limit theorem is the determination of
reasonable values of the population mean µ. Topics such as hypothesis testing,
estimation, quality control, and others make use of the central limit theorem.
Example 8.7. An important manufacturing process produces cylindrical component
parts for the automotive industry. It is important that the process produces parts
having a mean radius of 5.0 millimeters. The engineer involved conjectures that the
population mean is 5.0 millimeters. An experiment is conducted in which 100 parts
produced by the process are selected randomly and the diameter measured on
each. It is known that the population standard deviation s = 0.1 millimeters. The
experiment indicates a sample average diameter x  5.027 millimeters.
(a) Does this sample information appear to support or refute the engineer’s
conjecture?
(b) What is the probability that the sample mean can deviate by as mush as 0.027?
IE256 Engineering Statistics – Summer 2010
Sampling Distributions 23
IŞIKIE
Sampling Distribution of Means


P X  m  0.027  P X  m  0.027 or X  m  0.027 
X m
0.027
X m
0.027 

 P 

or

s n
s n
s n
s n
0.027
0.027 

 P Z  
or Z  

0.01
0.01 

 P Z  2.7   P Z  2.7   2P Z  2.7   2  0.0035  0.007
PZ  2.7
PZ  2.7
-3.0
-2.0
-1.0
IE256 Engineering Statistics – Summer 2010
0.0
1.0
2.0
3.0
Sampling Distributions 24
Sampling Distribution of the Difference between Two Averages
IŞIKIE
In the previous example, the engineer was interested in supporting a conjecture
regarding a single population mean. A far more important application involves two
populations. A scientist or engineer is interested in a comparative experiment in
which two manufacturing methods, 1 and 2, are to be compared.
Suppose we have two populations, the first with mean µ1 and variance σ1; and the
second with mean µ2 and variance σ2.
Let the statistic X 1 be the mean of the random sample of size n1 from population 1.
Let the statistic X 2 be the mean of the random sample of size n2 from population 2.
What can we say about the sampling distribution of the difference X 1  X 2 for
repeated samples of size n1 and n2?
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 25
Sampling Distribution of the Difference between Two Averages
IŞIKIE
Theorem 8.3. If independent samples of size n1 and n2 are drawn at random from
two populations with means µ1 and µ2, and variances σ12 and σ22 respectively, then
the sampling distribution of the differences of means, X 1  X 2 is normally
distributed with mean and variance given by
m X 1  X2  m 1  m 2
and
s X2 1  X2

s 12
n1

s 22
n2
Hence
Z
 X 1  X 2   m 1  m 2 
(s 12 n1 )  (s 22 n2 )
is approximately a standard normal variable.
Remark. If both n1 and n2 are greater than or equal to 30, the normal
approximation for the distribution of X 1  X 2 is very good when the underlying
distributions are not too far away from normal. However, even when n1 and n2 are
less than 30, the normal approximation is reasonably good except when the
populations are decidedly nonnormal. Of course, if both populations are normal,
then X 1  X 2 has a normal distribution no matter what the sizes are of n1 and n2.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 26
Sampling Distribution of the Difference between Two Averages
IŞIKIE
Example 8.8. Two independent experiments are being run in which two different
types of paints are compared. Eighteen specimens are painted using type A and the
drying time, in hours, is recorded on each. The same is done with type B. The
populations are both approximately normal and the standard deviations are both
known to be 1.0 hour.
Assuming that the mean drying time is equal for the two types of paint, find
P X A  X B  1.0 where X A and X B are average drying times for samples of
size nA = nB = 18.
sX
m A  mB  0
IE256 Engineering Statistics – Spring 2011
A  XB
1.0
Sampling Distributions 27
Sampling Distribution of the Difference between Two Averages
IŞIKIE
From the sampling distribution of X A  X B , we know that the distribution is
approximately normal with mean and variance
m X A  XB  m A  m B  0
s X2 A  XB

s A2
nA

s B2
nB

1
1
1


18
18
9
1  m A  m B 
10
z

3
19
13
P X A  X B  1.0  PZ  z   1  PZ  z   1  PZ  3.0
 1  0.9987  0.0013
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 28
Sampling Distribution of the Difference between Two Averages
IŞIKIE
Example 8.9. The television tubes of manufacturer A have a mean life of 6.5 years
and a standard deviation of 0.9 year, while those of manufacturer B have a mean
lifetime of 6.0 years and a standard deviation of 0.8 year.
Population 1
Population 2
m1 = 6.5
s1 = 0.9
m2 = 6.0
s2 = 0.8
n1 = 36
n2 = 49
What is the probability that a random sample of 36 tubes from manufacturer A will
have a mean lifetime that is at least 1 year more than the mean lifetime of a sample
of 49 tubes from manufacturer B?
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 29
Sampling Distribution of the Difference between Two Averages
IŞIKIE
The sampling distribution of X 1  X 2 will be approximately normal and will have a
mean and standard deviation
m X1  X2  6.5  6.0  0.5
sX
0.81 0.64

 0.189
1  X2
36
49
1.0  0.5
z
 2.65
0.189

IE256 Engineering Statistics – Spring 2011
P X 1  X 2  1.0  PZ  2.65
 1  PZ  2.65
 1  0.9960
 0.0040
Sampling Distributions 30
Normal Approximation to Binomial
IŞIKIE
Recall the binomial distribution function
n x
b( x ; n, p)    p (1  p)n x
 x
Probabilities associated with binomial experiments are readily obtainable from the
distribution formula b( x; n, p) or from Table A.1. when n is small.
However, it is very difficult to obtain these probabilities when n is large. For
instance, consider computing b(20;100,0.3) using your calculators. Therefore,
when n is large we use normal approximation.
And normal approximation is quite convenient because the cumulative distribution
of normal is tabled.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 31
IŞIKIE
Normal Approximation to Binomial
Theorem 6.2. If X is a binomial random variable with mean µ=np and variance
σ2=np(1-p), then the limiting form of the distribution of
X  np
Z
np(1  p)
as n->∞, is the standard normal distribution n(z;0,1).
It turns out that the approximation with µ=np and variance σ2=np(1-p) is very
accurate if n is very large and p is not so close to 0 or 1. As a rule of thumb, we use
this approximation if np≥5 and n(1-p) ≥5.
Let us consider X ~ b(x; 15, 0.4) to illustrate the normal approximation to the
binomial distribution. Let us compute P(X=4) and P(7 ≤X≤9) by using the normal
approximation.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 32
IŞIKIE
Normal Approximation to Binomial
When computing the probabilities we need to include the correction factor of 0.5.
0.25
0.20
0.15
0.10
0.05
0.00
0
1
2
3
4
IE256 Engineering Statistics – Spring 2011
5
6
7
8
9
10
11
12
13
14
15
Sampling Distributions 33
IŞIKIE
Normal Approximation to Binomial
P X  4  b(4; 15, 0.4)  0.1268
4.5  6 
 3.5  6
P X  4  P
Z
  P 1.32  Z  0.79
3.6 
 3.6
 PZ  0.79  PZ  1.32  0.2148  0.0934  0.1214
P7  X  9 
9
 b(x; 15,0.4)  b* (9; 15,0.4)  b* (6; 15,0.4)
x 7
 0.9662  0.6098  0.3564
9.5  6 
 6.5  6
P7  X  9  P
Z
  P0.26  Z  1.85
1.897 
 1.897
 PZ  1.85  PZ  0.26  0.9678  0.6026  0.3652
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 34
Normal Approximation to Binomial
IŞIKIE
Example 6.15. The probability that a patient recovers from a rare blood disease is
0.4. If 100 people are known to have contracted this disease, what is the probability
that less than 30 will survive?
Let the binomial variable X represent the number of patients that survive. Since
n=100, we should obtain fairly accurate results using the normal-curve
approximation
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 35
Sampling Distribution of S2
IŞIKIE
Recall the definition of the sample statistic S2.
Definition 8.6. If X1, X2, X3,…,Xn represent a random sample of size n, then the
sample variance is defined by the statistic
n
S2  
Xi  X 2
i 1
n 1
If a random sample of size n is drawn from A NORMAL POPULATION with mean µ
and variance σ2, and the sample variance is computed, we obtain a value of the2
statistic S2. We shall proceed to consider the distribution of the statistic n  1S .
s2
Theorem 8.4. If S2 is the variance of a random sample of size n drawn from a normal
population having the variance σ2, then the statistic
2 
(n  1)S
s2
2
n

i 1
Xi  X 
2
s2
has a chi-squared distribution with n = n – 1 degrees of freedom.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 36
Sampling Distribution of S2
IŞIKIE
But what is a chi-square distribution?
Theorem. If Z1, Z2,…, Zn are independent random variables each having standard
normal distributions, then
Y  Z12  Z 22  ...  Z n2
has a chi-squared distribution with n degrees of freedom.
Then, why does the distribution in Theorem 8.4. have n = n – 1 degrees of
freedom? What is “degree of freedom”?
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 37
Sampling Distribution of S2
IŞIKIE
The probability that a random sample produces a 2 value greater than some
specified value is equal to the area under the curve to the right of this value. It is
2

customary to let a represent the 2 value above which we find an area of a .
This is illustrated below by the shaded region:
0.12
0.10
Area = a
0.08
0.06
0.04
0.02
0.00
0.0
5.0
10.0  2
a
15.0
20.0
25.0
30.0
2
Table A.5 on page 755-756 in the text book gives values of  a for various values of
a and n . The areas, a , are given in the left column, and the table entries are the
values.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 38
Sampling Distribution of S2
IŞIKIE
2
The 2 value with 7 degrees of freedom, leaving an area of 0.05 to the right, is  0.05  14.07 .
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 39
Sampling Distribution of S2
IŞIKIE
Example: Assume a random variable Y has a chi-squared distribution with 9 degrees
of freedom. What is the probability that
(a) 12.24<Y<16.92
(b) 21.67<Y
Example: Assume a population is normally distributed with mean µ and variance σ2.
A random sample of 11 items is drawn from the population and the sample variance
s2 is computed. What is the probability that the sample variance s2 is at least 1.6
times the variance of the population σ2.
Example: Assume a population is normally distributed with a mean of 100 and a
standard deviation of 5 units. A random sample of 11 items is drawn from the
population and the sample standard deviation s is computed. What is the
probability that
(a) 6<s
(b) 3<s<7
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 40
Sampling Distribution of S2
IŞIKIE
Example 8.10. A manufacturer of car batteries guarantees that his batteries will
last, on the average, 3 years with a standard deviation of 1 year. If five of these
batteries have lifetimes of 1.9, 2.4, 3.0, 3.5, and 4.2 years, is the manufacturer still
convinced that his batteries have a standard deviation of 1 year? Assume that the
battery lifetime follows a normal distribution.
First find the sample variance
(5)(1.92  2.42  3.02  3.52  4.2 2 )  (1.9  2.4  3.0  3.5  4.2) 2
s 
(5)(4)
2
(5)(48.26)  (15) 2

 0.815
(5)(4)
Then compute
 
2
(n  1)s 2
s2

(5  1)(0.815)
 3.26
2
1
Now, what is your conclusion?
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 41
Sampling Distribution of S2
0.20
IŞIKIE
 51
0.18
0.16
0.14
 02.975  0.484
0.12
0.10
Area = 2.5%
 02.025  11.143
0.08
0.06
0.04
0.02
0.00
0.0
5.0
0.484
IE256 Engineering Statistics – Spring 2011
10.0
15.0
20.0
11.143
Sampling Distributions 42
IŞIKIE
t-Distribution
Theorem 8.5. Let Z be a standard normal random variable and V a chi-squared
random variable with n degrees of freedom. If Z and V are independent, then the
distribution of the random variable T, where
T
Z
V /
is given by the density function
(  1) / 2
 / 2 
 t
1 
 
 ( 1) / 2


for
t 

This is known as the t-distribution with n degrees of freedom.
h( t ) 
IE256 Engineering Statistics – Spring 2011
2
Sampling Distributions 43
IŞIKIE
t-Distribution
Now consider a sample of size n drawn from a normal population and remember
that
Xm
Z
s n
has the standard normal distribution. For the same sample, consider the following
statistic related to the variance of the sample:
V
(n  1)S 2
s2
Random variable V has chi-squared distribution with (n-1) degrees of freedom. Then
T
X m
s/ n
(n  1)S 2

X m
S/ n
s2
n 1
has a t-distribution with (n-1) degrees of freedom.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 44
IŞIKIE
t-Distribution
Corollary 8.1. Let X1, X2,…, Xn be independent random variables that are all normal
with mean µ and standard deviation σ. Let
1 n
X   Xi
n i1
and
1 n
2
S 
(
X

X
)
 i
n  1 i 1
2
X m
Then the random variable T 
has a t-distribution with n = n – 1 degrees of
S/ n
freedom.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 45
IŞIKIE
t-Distribution
Below, you can see a graph of the t-distribution with various degrees of freedom.
Observe that t-distribution approaches the standard normal distribution as the
sample size n increases.
Z
T dof = 15
T dof = 4
T dof = 2
-5.0
-4.0
-3.0
-2.0
IE256 Engineering Statistics – Spring 2011
-1.0
0.0
1.0
2.0
3.0
4.0
5.0
Sampling Distributions 46
IŞIKIE
t-Distribution
It is customary to let ta represent the t-value above which we find an area equal
to a . Hence, the t-value with 10 degrees of freedom leaving an area of 0.05 to its
right is t = 1.812. Since the t-distribution is symmetric about a mean of zero, we have
t1-a = – ta ; that is, the t-value leaving an area of 1 – a to the right and therefore an
area of a to the left is equal to the negative t-value that leaves an area of a in the
right tail of the distribution. That is, t0.95 = – t0.05 , t0.99 = – t0.01 and so forth.
T   d.o.f.  10
P(T  t0.05 )  P(T  1.812)  0.05
t0.05  1.812
Area = 0.05
-4.0
0.0
1.812
IE256 Engineering Statistics – Spring 2011
4.0
Sampling Distributions 47
IŞIKIE
t-Distribution
Example 8.11. What is the t-value
with v=14 degrees of freedom that
leaves an area of
(a) 0.025 to the right
(b) 0.975 to the right
(c) 0.75 to the right
Example 8.13. Find k such that
P(k<T<-1.761)=0.045 for a random
sample of size 15 selected from a
normal distribution and T  X  m
S/ n
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 48
t-Distribution
IŞIKIE
Example 8.14. A chemical engineer claims that the population mean yield of a
certain batch process is 500 grams per milliliter of raw material. To check this claim
he samples 25 batches each month. If the computed t-value falls between – t0.05
and t0.05 , he is satisfied with his claim. What conclusion should he draw from a
sample that has a mean x  518 grams per milliliter and a sample standard
deviation s= 40grams per milliliter? Assume the distribution of yields to be
approximately normal.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 49
t-Distribution
IŞIKIE
Example 8.50. A manufacturing firm claims that the batteries used in their
electronic games will last an average of 30 hours. To maintain this average, 16
batteries are tested each month. If the computed t-value falls between –t0.025 and
t0.025, the firm is satisfied with its claim. What conclusion should the firm draw from
a sample that has a mean of 27.5 hours and a standard deviation of 5 hours?
Assume that the distribution of battery lives to be approximately normal.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 50
IŞIKIE
F-Distribution
The F-distribution is used in the comparison of sample variances.
The statistic F is defined to be the ratio of two independent chi-squared random
variables, each divided by its number of degrees of freedom. Hence, we can write
F
U /1
V / 2
where U and V are independent random variables having chi-squared distributions
with v1 and v2 degrees of freedom, respectively.
Theorem 8.6. Let U and V be two independent random variables having chi-squared
distributions with v1 and v2 degrees of freedom, respectively. The the distribution of
the random variable F  U /1 is given by the density
V / 2

1/2



(



)
/
2
(

/

)

1
2
1 2
h( f )  
1/ 2  2/ 2 


0
f
(1/2)  1
(1  1 f )
(1)2/2
if f  0
if f  0
This is known as the F-distribution with v1 and v2 degrees of freedom.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 51
IŞIKIE
F-Distribution
The curve of F-distribution depends not only on the two parameters n1 and n2 but
also on the order in which we state them.
1.00
F (10,30)
0.8
0
F (6,10)
0.6
0
0.4
0
0.20
0.0
0.0
0
0
0.5
0
1.00
1.50
2.00
IE256 Engineering Statistics – Spring 2011
2.50
3.0
0
3.50
4.0
0
Sampling Distributions 52
IŞIKIE
F-Distribution
We let fa be the f value above which we find an area equal to a. Table A.6. in your
textbook gives values of fa only for a0.05.
F (6,20)
Area a
0
fa (6,20)
Theorem 8.7. Writing fa1,2 for fa with 1 and 2 degrees of freedom, we obtain
1
f (v 2 , v 1) 
1a
fa (v 1 , v 2 )
Example. Find f
(6 , 10)
0.95
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 53
F-Distribution
IE256 Engineering Statistics – Spring 2011
IŞIKIE
Sampling Distributions 54
F-Distribution
IE256 Engineering Statistics – Spring 2011
IŞIKIE
Sampling Distributions 55
IŞIKIE
F-Distribution
Suppose that random samples of size n1 and n2 are selected from two normal
populations with variances s12 and s22, respectively. Define two random variables U
and V:
(n1  1)S12
(n2  1)S22
U
V
2
2
s1
s2
We know that U and V are random variables having chi-squared distributions with
n1 = n1 – 1 and n2 = n2 – 1 degrees of freedom respectively. Recall that
U /1
S12 s 12
F
 2 2
V /2
S2 s 2
is an F-distributed random variable with 1 and 2 degrees of freedom as stated in
the following theorem.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 56
IŞIKIE
F-Distribution
Theorem 8.8. If S12 and S22 are the variances of independent random samples of size
n1 and n2 taken from normal populations with variances σ12 and σ 22, respectively,
then
F
S12 / s 12
S22 / s 22
has an F-distribution n1 = n1 – 1 and n2 = n2 – 1 degrees of freedom.
Example: If S12 and S22 represent the variances of independent random samples of
size n1 = 8 and n2 = 12, taken from normal populations with equal variances,
2 2
find P(S1 / S2  4.89)
What if s 12  4 and s 22  2 ?
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 57
IŞIKIE
Some Exercises
Exercise 8.54. Pull-strength tests on 10 soldered leads for a semi conductor device
yield the following results in pounds force required to rupture the bond:
19.8, 12.7, 13.2, 16.9, 10.6, 18.8, 11.1, 14.3, 17.0, 12.5
Another set of 8 leads was tested after the encapsulation to determine whether
the pull-strength has been increased by encapsulation of the device, with the
following results:
24.9, 22.8, 23.6, 22.1, 20.4, 21.6, 21.8, 22.5
Comment on the evidence concerning equality of the two population variances.
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 58
Some Exercises
IŞIKIE
Example: The expression level of a gene is measured in a number of control
subjects and patients. The values measured in controls are: 10, 12, 11, 15, 13, 11, 12 and
the values measured in patients are: 12, 13, 13, 15, 12, 18, 17, 16, 16, 12, 15, 10, 12. Is the
variance different between controls and patients?
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 59
IŞIKIE
Some Exercises
Exercise 8.67. The breaking strength of a certain rivet used in a machine engine has
a mean 5000 psi and standard deviation 400 psi. A random sample of 36 rivets is
taken. Consider the distribution of X, the sample mean breaking strength.
(a) What is the probability that the sample mean falls between 4800 psi and 5200
psi
(b) What sample size n would be necessary in order to have
P(4900  X  5100)  0.99
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 60
Some Exercises
IŞIKIE
Exercise 8.52. A maker of a certain brand of low fat cereal bars claims that their
average saturated fat content is 0.5 gram. In a random sample of 8 cereal bars of
this brand, the saturated fat content was 0.6, 0.7, 0.7, 0.3, 0.4, 0.5, 0.4 and 0.2.
Would you agree with the claim? Assume a normal distribution.
Exercise 8.74. Suppose a filling machine is used to fill cartons with a liquid product.
The specification that is strictly enforced for the filling machine is 9±1.5 oz. If any
carton is produced with weight outside these bounds, it is considered defective. It
is hoped that at least 99% of cartons will meet these specifications. With the
conditions µ=9 and σ=1, what proportion of cartons from the process are defective?
If changes are made to reduce variability, what must σ be reduced to in order to
meet specification with probability 0.99? Assume a normal distribution for the
weight.
Exercise 8.75. Consider the situation in 8.74. Suppose a considerable quality effort
is conducted to tighten the variability in the system. Following the effort, a random
sample of size 40 is taken and the sample variance s2=0.188 ounces2. Do we have a
strong evidence that σ2 has been reduced below 1.0?
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 61
IŞIKIE
Some Exercises
Example: A sample of 10 fiber-reinforced polymer (FRP) strips mechanically
fastened to highway bridges were tested for bearing strength. The strength
measurement Y (in mega Pascal units, MPa) was recorded for each strip. Assume
that Y is normally distributed with variance σ2=100.
a) Find the probability that s2 is less than 16.92.
b) The data for the experiment are listed below. Do these data tend to contradict
or support the assumption that σ2=100?
240.9, 248.8, 215.7, 133.6, 231.4, 230.9, 225.3, 247.3, 235.5, 238.0
IE256 Engineering Statistics – Spring 2011
Sampling Distributions 62