Download P(A | B)

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
III Modeling Random Behavior
A. Probability
1. Overview
Statisticians use probability to model uncertainty.
Consider-these statements:
•
The probability that the next batch of Ti02 (white pigment) is unacceptable is
.01.
•
There is a 25% chance our firm will get the IBM order.
In each case, what we mean by a probability is
the "size" of a set of interest
"size" of the set of all possible outcomes
Some notation will aid our discussion.
An event is a set of possible outcomes of interest.
The sample space, S, is the set of all possible outcomes.
If A is an event, the probability that A occurs is
Note:
size of A
P( A) 
size of S
• 0 ≤ P(A) ≤ 1
• P(S) = 1
• Probability may be either objective (based on prior experience) or poorly subjective.
• Ultimately, the accuracy of specific probabilities depends on assumptions.
• If the assumptions upon which we base a specific probability are wrong, then we
should not expect the specific probability to be any good.
Example: Making Nickel Battery Plate
A particular process for making nickel battery plate requires an operator to sift
nickel powder into a frame.
The process uses a very tight weight specification, which is difficult to make.
The supervisor monitored the last 1000 attempts made by an operator.
The operator successfully made the specification 379 times.
One way to get the probability of a successful attempt is
379
P(successful attempt) 
 .379
1000
The supervisor noted, however, that the operator seemed to get better over
time.
In this case, the supervisor may believe that the actual probability is something
larger than .379.
A perfectly reasonable, but subjective, estimate of the
probability of a successful attempt is 0.4.
2. Making Inferences Using Probabilities.
Suppose the supervisor really believes that the probability of a successful
attempt is 0.4.
Suppose further that out of the next 50 attempts, she is never successful.
Would you now believe that the probability of a success is still 0.4?
OF COURSE NOT!
Consider another scenario.
Suppose the first attempt is unsuccessful.
Do you have good reason to believe that the true probability of a success is
not 0.4?
Suppose the first two attempts are unsuccessful.
Suppose the first three are unsuccessful.
At what point do we begin to believe that the probability really is not 0.4?
The answer lies in calculating the probability of seeing y in a row assuming a
probability of 0.4.
Once that probability is small enough, we can reasonably conclude that the true
probability is not 0.4
3. Conditional Probability
Often, two events are related.
Knowing the relationship between the two events allows us to model the behavior of
one event in terms of the other.
Conditional Probability quantifies the chances of one event occuring given that the
other occurs.
We denote the probability that an event A occurs given that the event B has
occured by P(A|B).
The key to conditional probability: the intersection of the two events defines the
nature of their relationship.
This concept is best illustrated by an example: Personal Computers
A major manufacturer of personal computers has introduced a new p.c.
• As with most new products, there seem to be some problems.
• This manufacturer offers a one-year warranty on this model.
Let
• A be the event that the hard drive on a specific computer fails within one year.
• B be the event that the floppy drive on a specific computer fails within one year.
Consider a specific computer whose floppy drive has failed.
In this case, we know that the event B has occured.
What now is the probability that this same computer will have its hard drive fail?
What we seek is P(A|B).
Note: once we know that B has occured, the sample space of interest is
restricted to B.
Similarly, once we know that B has occured, the set of interest is restricted to
that portion of A which resides in B, A ∩ B.
size of the set of interest ( A  B)
P( A | B) 
1
size of the set of all possible outcomes (B)
1
size of the set of interest ( A  B)

 size of S
size of the set of all possible outcomes (B) 1
size of S
size of the set of interest ( A  B)
size of S

size of the set of all possible outcomes (B)
size of S
P( A  B)

P( B)
Definition: Conditional Probability
Let A and B be events in S.
The conditional probability of B given that A has occurred is
P( A  B)
P ( B | A) 
P ( A)
if P(A) > 0.
Similarly, the conditional probability of A given that B has occurred is
P( A  B)
P( A | B) 
P( B)
if P(B) > 0.
Example: Personal Computers - Continued
The reliability engineers have determined that:
P(A) = .02
P(B) = .05
P(A U B) = .01
Note: P(A U B) is the probability that both the hard and floppy drives on a
specific computer fail within one year).
The conditional probability that the hard drive fails given that the floppy drive
fails is
P( A  B) .01
P( A | B) 

 .2
P( B)
.05
As a result, if we know that the floppy drive failed on a given machine, then the
probability the hard drive will fail also is 20%.
4. Independence
In many engineering situations, two events have no real relationship.
Knowing that one event has occured offers no new information about the
chances the other will occur.
We call two such events independent.
Independence is important for a number of reasons:
• many engineering events either are independent or close enough for a first
approximation
• independence provides a powerful basis for modeling the joint behavior of
several events
• the formal concept of a random sample assumes that the observations are
independent.
Definition: Independence
Let A, B be events in S. A and B are said to be independent if
P(A | B) = P(A)
Similarly, if A and B are independent, then
P(B | A) = P(B)
Example: Personal Computer - Continued
Recall:
P(A) = .02
P(B) = .05
P(A | B) = .20
Note: the hard drive failing and the floppy drive failing are not independent
events because
P(A | B) ≠ P(A)
Why? Many personal computer designs use the floppy drive as an
expensive air filter.
As the floppy drive gets dirty, it increases the likelihood of it failing.
Also, as the floppy get dirty, the p.c. does not vent heat as well, which
increases the likelihood that the hard drive fails.
5. Basic Rules of Probability
1. 0 ≤ P(A) ≤ 1.
2. If Ø is the empty set, then P(Ø) = 0.
3. The Probability of Complements
If A is an event in some sample space S, then the complement of the
set A relative to S is the set of outcomes in S which are not in
A.
We denote the complement of A by Ā.
P(Ā) = 1 - P(A)
4. The Additive Law of Probability
If A and B are events in S, then the union of A and B, denoted by A U B,
is the set of outcomes either in A or in B or in both.
The Additive Law of Probability is
P(A U B) = P(A) + P(B) - P(A ∩ B)
If A and B are mutually exclusive, then A ∩ B = Ø and P(A ∩ B) = 0; thus,
P(A U B) = P(A) + P(B)
5.
The Multiplicative Law of Probability
If A and B are events in S, with P(A) > 0 and P(B) > 0, then
P( A  B)
P ( B | A) 
P ( A)
Thus,
P(A ∩ B) = P(A) • P(B | A)
Similarly,
P(A ∩ B) = P(B) • P(A | B)
If A and B are independent, then
P(A | B) = P(A)
P(B | A) = P(B)
Thus, if A and B are independent, then
P(A ∩ B) = P(A) • P(B)
This property is a very powerful result, making independence quite important
for finding the probabilities associated with the intersections of events.
6. Simplest Form of the Law of Total Probability
Let A and B be events in S. We may partition B into two parts:
• that which overlaps A, A ∩ B, and
• that which overlaps Ā, Ā ∩ B
Thus,
P(B) = P(A ∩ B) + P(Ā ∩ B)
= P(A) • P(B | A) + P(Ā) • P(B | Ā)
7.
The Simplest Form of Bayes Rule
Let A and B be events in S.
Suppose we are given P(A), P(B | A), and P(B | Ā)
P( A  B)
P( A)  P( B | A)
P( A | B) 

P( B)
P( A)  P( B | A)  P( A )  P( B | A )
Example for Toothpaste Containers
A toothpaste company uses four injection molding processes to make its
toothpaste containers.
These are older pieces of equipment and subject to problems.
Event
A
B
C
D
Description
Machine 1 has a problem
on any specific day
Machine 2 has a problem
on any specific day
Machine 3 has a problem
on any specific day
Machine 4 has a problem
on any specific day
Prob
0.1
0.2
0.05
0.05
What is the probability that no problems occur on any specific day?
Note: no problems means
• Machine 1 has no problems, A , and
• Machine 2 has no problems, B , and
• Machine 3 has no problems, C , and
• Machine 4 has no problems, D
Thus, we seek the probability of an intersection.
If we can assume independence, then the probability of the intersection is the
product of the individual probabilities.
P(no problems)
 P( A  B  C  D )
 P( A )  P( B )  P(C )  P( D )
 (1  0.1)(1  0.2)(1  0.05)(1  0.05)
 (.9)(.8)(.95)(.95)  0.6498
B. Discrete Random Variables
1. Overview
Let Y be the number of problems that occur on a given day.
What does Y = 0 mean?
No problems, which is
A  B  C  D  P(Y  0)  .6498
What does Y=1 mean?
Exactly one problem, which is
A  B  C  D or A  B  C  D or A  B  C  D or A  B  C  D
Note: Each one of these events is mutually exclusive of the others; thus,
P(Y = 1)
 P( A  B  C  D )  P( A  B  C  D )
 P( A  B  C  D )  P( A  B  C  D)
When all is said and done P(Y = 1) = 0.30305
In a similar manner, we can show that
P(Y = 2) = 0.04455
P(Y = 3) = 0.00255
P(Y = 4) = 0.00005
Y is an example of a random variable.
We describe the behavior of a random variable by its distribution.
Every random variable has a cumulative distribution function, F(Y) defined by
F(y) = P(Y ≤ y)
In our case
y
0
1
2
3
4
F(y)
0.64980
0.95285
0.99740
0.99995
1.00000
There are two types of random variables:
• Discrete, which have a countable number of possible values
• Continuous, which are over a continuum (have an uncountable number of value).
Discrete random variables have a probability function, p(y) defined by
p(y) = P(Y = y)
For our example
y
0
1
2
3
4
p(y)
0.64980
0.30305
0.04455
0.00255
0.00005
2. Expected Values
Random variables and their distributions provide a way to model random
behavior and populations.
Parameters are important characteristics of populations.
For example,
•
the typical number of problems which occur each day
•
the variability in the number of problems which occur.
We have already outlined a distribution which describes this number.
We can use this distribution to define measures of typical and of variability.
Let Y be the discrete random variable of interest.
For example, let Y be the number of problems which occur on any given day with
the injection molding process for toothpaste tubes.
A measure of the typical value for Y is the population example, or the expected
value for Y.
  E (Y )   y  p( y)
y
A measure of the variability of Y is the population variance, σ2, defined by
  E( y )  
2
2
2
where
E (Y )   y  p( y)
2
2
y
What are What are the units of the population variance?
As a result, we often use the population standard deviation, σ, as a measure
of variability where
 
2
Many texts note that virtually all of the data for a particular distribution should
fall in the interval μ ± 3σ (the empirical rule).
In general, we should take this recommendation with a grain of salt because
very skewed or heavy tailed distributions are exceptions.
The empirical rule does point out that we can begin to describe the behavior
of many distributions with just two measures:
• the population mean and
• the population standard deviation.
Many engineers commonly evaluate their data using this notion of the mean
plus or minus three standard deviations.
Example: Number of problems with an injection molding process for toothpaste tube.
y
0
1
2
3
4
p(y)
0.64980
0.30305
0.04455
0.00255
0.00005
Total
y2
y•p(y)
0
0
0.30305 1
0.08910 4
0.00765 9
0.00020 16
0.4
  E (Y )   y  p( y)  0.4
y
y2•p(y)
0
0.30305
0.17820
0.02295
0.00080
0.505
E (Y )   y  p ( y )  0.505
2
2
y
  E (Y )    0.505  (0.4)  .345
2
2
2
2
    0.587
2
Note:
μ ± 3σ = 0.4 ± 3(0.587) = (-1.361, 2.161)
The chances of seeing data within this interval are 99.74\%.
3. Binomial Distribution
The manufacturer of nickel battery plate has imposed a tight initial weight
specification which is difficult to meet.
Consider the next three attempts made by an operator who has a 40 % chance of
being successful.
Let S represent a successful attempt.
Let F represent a failed attempt.
Let Y represent the number of successful attempts she makes.
Consider the probability that exactly two out of these three attempts are
successful, i.e, P(Y = 2).
The possible ways she can get exactly two successful attempts are
(SSF) (SFS) (FSS)
Since these events are mutually exclusive, then the probability of exactly two
successful attempts is
P(Y = 2) = P(SSF) + P(SFS) + P(FSS)
In this situation, we can reasonably assume that each attempt is independent of
the others.
Let p be the probability that she succeeds in meeting the weight specification on
any given attempt.
Thus, p = 0.4.
Let q = 1 - p be the probability that she fails. In this specific case, q = .6.
Since each attempt is independent of the others, then
P(SSF) = P(S) • P(S) • P(F) = p • p • q = p2 • q = 0.096
P(SFS) = P(S) • P(F) • P(S) = p • q • p = p2 • q = 0.096
P(FSS) = P(F) • P(S) • P(S) = q • p • p = p2 • q = 0.096
As a result,
P(Y = 2) = P(SSF) + P(SFS) + P(FSS)
= p2 • q + p2 • q + p2 • q
= 3 • p2 • q
= (number of ways to get 3 successes) • p2 • q
= 3 (0.096) = 0.288
In general, if she makes n total attempts, the probability that she succeeds exactly y
times is
P(Y = y) = (number of ways, y successes out of n) •py • qn-y
We commonly use the binomial coefficient
get y successes from n total attempts.
We define
n
 
 y
to denote the number of ways to
n
n!
 
 y  y!(n  y )!
by
By definition
n
 
 y
0!= 1
We now can write the probability of obtaining exactly y successes out of n total
attempts as
P(Y = y) = (number of ways to get y successes out of n tries) • py • qn-y
=
n
 p q
 y
y
n y
n!

pq
y!(n  y )!
y
n y
Consider an experiment which meets the following conditions:
1. the experiment consists of a fixed number of trials, n;
2. each trial can result in one of only two possible outcomes: a “successes” or a
“failure”;
3. the probability, p, of a “success” is constant for each trial;
4. the trials were independent; and
5. the random variable of interest, Y is the number of successes over the n
trials.
If these conditions hold, then Y is said to follow a binomial distribution with
parameters, n and p.
The probability function for a binomial random variable is
n
p( y )    p q
 y
y
n y
n!

pq
y!(n  y )!
The mean, variance, and standard deviation are
  E (Y )  np
  npq
2
  npq
y
n y
Example
NASA downloads massive data files from a specific satellite three times a day.
Historically, the probability that the data file is corrupted during transmission is .10.
Consider a day's set of transmissions.
What is the probability that exactly two data files are corrupted?
Let Y = number of files corrupted.
P(y = 2)
n
 3
 p ( 2)    p q   (.1) (.9)
 y
 2
3!

(.1) (.9)
2!(1!)
3  2!

(.1) (.9)  3(.1) (.9)
2!
 0.027
y
n y
2
2
2
2
1
Find the mean number of files corrupted.
μ = np = 3(.1) = .3
Find the variance and standard deviation for the number of files corrupted.
σ2 = npq = 3(.1)(.9) = .27
    0.52
2
Using the empirical rule, we expect virtually of the data to fall within the interval
μ ± 3σ = 0.3 ± 3(0.52) = (-1.26, 1.86)
As a result, we should rarely see 2 or more corrupted files.
4. Poisson Distribution
Many engineering problems require us to model the random behavior of small
counts.
For example, a manufacturer of nickel- hydrogen batteries ran into a problem
with cells shorting out prematurely.
Each cell used 60 nickel plates.
The manufacturer and its customer cut open several cells and discovered that
the problem cells all had plates with “blisters” while the good cells did not.
Two possible approaches:
• Classifies each plate as either conforming (blister free) or
non-conforming (one or more blisters).
-- Model with a binomial distribution.
-- Reduces the data into either acceptable or not acceptable.
-- Often ignores the subtleties in the data.
• Count the number of blisters on each cell.
-- Conforming plates have counts of 0.
-- Non-conforming plates have counts of 1 or more.
-- A plate with many blisters truly is defective and does short out a cell.
-- A plate with only one blister may function perfectly well.
Counting the number of blisters provides more information about the specific problem.
The Poisson distribution often proves useful for modeling small counts.
Let λ be the rate of these counts.
If Y follows a Poisson distribution, then

 e
p( y )  P(Y  y )   y!
 0
y
With

y  0,1,2,
otherwise
  E (Y )  
 
2
 
Example: Consider a maintenance manager of an industrial facility.
Historically, a certain department averages six repairs per week.
What is the probability that during a randomly selected week, this department
will require only two repairs?
Let Y = number of repairs.
P(Y=2)


y
e

y!
( 6)

e
2!
36

(e )
2
 0.0446
2
6
6
What is the probability of at least one repair?
P (Y  1)  1  P (Y  0)
( )
 1
e
0!
 1 e
 1  .0025
 .9975
0

6
What is the expected number of repairs?
  E (Y )    6
What are the variance and standard deviation for the number of
repairs?
  6
2
    6  2.45
Using the empirical rule, we expect virtually of the data to fall within the
interval
  3   6  3(2.45)  (1.35, 13.35)
As a result, we should rarely see 14 or more repairs in any given week.
C. Continuous Random Variables
1. Overview
The continuous random variables studied in this course have probability density
functions, f(y).
Some Properties of f(y):
1.
f ( y)  0
2.
 f ( y ) dy  1
3.
F ( y )  P(Y  y )   f ( y)dy
4.
P( y  Y  y )   f ( y)dy  F ( y )  F ( y )
5.
P(Y  y )  0


y0
0

0
y2
1
2
y1
2
1
0
P(Y  y )   f ( y)dy  0
y0
0
y0
A very important example of a continuous random variable is one which follows
an exponential distribution.
The exponential distribution often provides an excellent model for describing the
behavior of equipment life times.
Example:
The times between repairs for an ethanol-water distillation column are well
modeled by an exponential distribution which has the form
e
 y
f ( y)  
 0
y  0,   0
otherwise
where λ is the rate of repairs.
In this case, λ = .001 repairs/hr.
Thus, this column, on the average, requires 1 repair every 1000 hours of
operation.
What is the probability that the next time to repair will be less than 100 hours
from the previous repair?
P(Y  y )   f ( y)dy


0
For our example,
  e dy
 e |
 y
y0
0
 y
y0
0
 1 e
In our case, λ = .001 and y0 = 100; thus,
 y 0
P(Y  100)  1  e
 1 e
 1  0.905
 0.095
 (.001)( 100 )
.1
What is the probability that the time between repairs will be between 500 and
1500 hours?
P( y  Y  y )   e dy
 y
y2
1
2
y1
 e |
 y
e
 y1
y2
y1
e
 y 2
In this case, y1=500 and y2=1500; thus,
P(500  Y  1500 )  e  e
0.5
1.5
 0.383
2. Expected Values – Revisited
For a continuous random variable, Y, the expected value is
  E (Y )   yf ( y )dy


The variance of Y is once again
  E( y )  
2
2
2
where
E (Y )   y f ( y )dy
2

2

Once again, the standard deviation is
 
2
Example: The time between repairs
We said these times were well modeled by an exponential distribution with λ = .001
accidents/hr.
e
 y
f ( y)  
 0
y  0,   0
otherwise
E (Y )   yf ( y )dy


  ye dy
1


 y
0

1

 1000 hours
.001
E (Y )   y f ( y )dy

2
2

  y e dy
2


2
 y
0

2
  E (Y )  
2
2
1

 
  
2
1


2
2

2

1

2

2
2
2
 

2
1

In our case,
 1 
 
  1,000,000
 .001
1

 1000
.001
2
2
3. Relationship of Distributions and Data Displays
Distributions can provide a powerful basis for modeling the random behavior of
important characteristics of interest.
Formal statistical analyses require certain assumptions about the underlying
distribution of the data.
Typically, these assumptions center on the “shape” of the data.
Appropriate data displays provide a quick and easy way to check these
assumptions, especially the stem-and-leaf display and the histogram.
The theoretical shape of a stem-and-leaf display for a given set of data is
• the probability function, p(y) for a discrete random variable,
and
• the pdf, f(y), for a continuous random variable.
Example: Times Between Industrial Accidents
Lucas (1985) analyzed the times between accidents at an industrial facility.
We can model these times by an exponential distribution with λ = 0.05.
The following plot graphs the pdf for this specific distribution.
Consider overlaying an appropriately scaled plot of the pdf on a histogram of the
data.
This plot indicates that the exponential distribution does provide a reasonable
basis for modeling these times.
4. The Normal Distribution
The normal distribution is the single most important distribution in classical
statistics.
Many naturally occuring phenomenon are well modeled by this distribution.
Let Y be a normally distributed random variable, its pdf is given by
1
f ( y) 
e
2
1 y 
 

2  
Note: the pdf depends on the parameters
•
μ – the population mean
•
σ2 – the population variance
Thus,
E(Y) = μ
var(Y) = σ2
2
The plot of the pdf looks like
The plot is single peaked, centered at μ, symmetric, and the tails die out rapidly.
• 68.3% of the area of the curve falls within the interval μ ± σ
• 95.4% falls within μ ± 2σ
• 99.7% falls within μ ± 3σ
We can find any probabilities we need through the standard normal random
variable.
The standard normal distribution has
•μ=0
• σ2 = 1
We denote a standard normal random variable by Z.
The values listed in Table I of the Appendix are P(Z ≤ z_0).
Thus, P(Z ≤ 1.96) = 0.9750
Consider P(Z > z0)
P(Z > 2.33) = 1- P(Z ≤ 2.33) = 1 - .9901 = .0099
Finally, consider P(z1 < Z ≤ z2)
Consider P(-1.00 ≤ Z ≤ 1.50)
P(Z ≤ 1.50) - P(Z ≤ -1.00) = .9332 - .1587 = .7745 .
We often need to use the Z-value associated with specific “tail” areas of the
standard normal distribution.
Let
z
z
is that value for Z such that


represent the Z-value associated with a right hand “tail area” of  .
P( Z  z )  

As a result,
z

is that value from the table which satisfies
1.0  P(Z  z )  

or
P( Z  z )  1.0  

value from table  1.0  
For example, z0.025 is that Z such that
P(Z ≤ z0.025) = 1.0 - 0.025 = 0.975 .
Looking into the body of the table, we obtain
Z0.025 = 1.96
We can transform any normal random variable, Y, to a standard normal, Z, by
Y 
Z

By subtracting μ, we recenter the random variable around 0.
By dividing by σ, we rescale the random variable so that the variance is 1.
By subtracting μ, which is the expected value of Y, the expected value of Z is 0.
By dividing by σ, we rescale the random variable so that the resulting Z value
represents the number of standard deviations a value of a random variable lies
from its mean.
Example: Suppose that you are an engineer assigned to the bottling department
of the Busch Beer Company.
A particular 12 oz. bottling machine is known to dispense beer according to a
normal distribution with a mean of 12 oz and a variance of .04 oz2.
What is the probability that this machine dispenses more than 12.5 oz?
Let Y be the amount dispensed. We seek
P (Y  12.5)  P (Y    12.5   )
 Y   12.5   
 P



 

12.5  12.0 

 P Z 

.2


 P ( Z  2.5)
 1  P ( Z  2.5)
 1  .9938  .0062
What is the probability that between 11.75 and 12.5 oz. are dispensed?
 11.75   Y   12.5   
P(11.75  Y  12.5)  P




 
 
12.5  12.0 
 11.75  12.0
 P
Z

.2
.2


 P(1.25  Z  2.5)
 P( Z  2.5)  P( Z  1.25)
 .9938  .1056
 .8882
D. Random Behavior of Means
1. The Sample Mean
Definition: Sample mean
Let y1, y2, …, yn be a sample of n observations.
The sample mean, y , is given by
1
y y
n
n
i 1
i
The sample mean is a measure of the typical value for a data set.
It represents the “center of gravity”.
Example Battery Plate Porosities
Nickel - Hydrogen (Ni-H) batteries use a nickel plate as its anode.
A critical quality characteristic is the plate's porosity which controls the interface
of the anode with the potassium hydroxide electrolyte solution.
A recent random sample of ten porosities yielded:
79.1 79.5 79.3 79.3 78.8
79.0 79.2 79.7 79.0 79.2
The sample mean is
1
y y
n
792.1

10
 79.21
n
i 1
i
2. Random Samples
Define: Random Sample
Let y1, y2, …, yn be a sample of n observations taken from some population.
If these observation are independent of each other and if each observation
follows the same distribution, then
y1, y2, …, yn
is said to be a random sample.
All the distribution theory of classical statistics is based upon this concept of a
random sample.
3. Central Limit Theorem
Consider taking a series of random samples, all of size n, from some population,
and calculating y for each one.
Since the data are random, y is also a random variable!
An important question: What is its distribution?
If the population from which we sample is normal, the y also follows a normal
distribution.
But, how often do you know that the population really is normal?
Very Rarely!
The Central Limit Theorem:
Better Known as the Statistician's Full Employment Act.
Consider a population with mean μ and variance σ2.
As the sample size, n, approaches infinity, the distribution of
y
Z
/ n
approaches the standard normal distribution.
Bottom line: If n is sufficiently large, then y approximately follows a normal
distribution with
•μ
• “standard error”  / n
• Z represents the number of standard errors y lies from μ.
What is the catch?
What constitutes sufficiently large?
If the parent population is normal, n = 1
If population is symmetric and the tails die out rapidly,
then n = 3-5 is large enough.
A classic example is the uniform distribution.
Note:
• The distribution is symmetric.
• It does not have a unique peak.
• When its tails die, they die!
In this case, sample sizes of 6-12 are considered adequate for applying the
Central Limit Theorem.
As the parent distribution looks less and less normal, the sample size required to
assume the Central Limit Theorem gets larger.
Important point: When determining if the sample size is big enough, we need to
look at the distribution for the parent population.
In practice, what must we check to see whether the Central Limit Theorem applies?
• Stem-and-Leaf displays
• Normal Probability Plots
4. Normal Probability Plot
The normal probability plot is a simple graphical tool for assessing if the data
come close to following a normal distribution.
Many software packages generate it automatically.
If the data follow a normal distribution, the normal probability plot should look
like a straight line.
Significant deviations from the straight line suggest that the data are not “wellbehaved”.
A reasonable question: How straight is straight?
Many analysts use the “fat pencil” rule.
For a suitably scaled plot, if we can cover the points with a fat pencil, the line is
straight enough.
Example: The Plate Porosities
The Stem-and-Leaf Display
Stem
78.•:
79.*:
79.t:
79.f:
79.s:
Leaves
8
001
2233
5
7
No.
1
3
4
1
1
Depth
1
4
2
1
The Normal Probability Plot
Quantiles of standard normal
2
1
0
-1
-2
78.8
78.9
79.0
79.1
79.2
79.3
79.4
79.5
79.6
79.7
y
For a sample size of 10, we should feel reasonably comfortable assuming the
Central Limit Theorem in this case.
5. Using the Central Limit Theorem
Suppose the historic standard deviation for these porosities has been 0.25.
Suppose further that the target porosity is 79.0.
A reasonable question: What is the probability that we see a sample mean for 10
porosities ≥ 79.21 (the observed sample mean from our sample)?
We seek P( y  79.21)
 y   79.21   
P( y  79.21)  P


/ n 
 / n
79.21   

 P Z 

/ n 

In our case
μ = 79.0
σ = 0.25
n = 10
Thus,
79.21  79.0 

P( y  79.21)  P Z 

0.25 / 10 

 P( Z  2.66)
 1  [ P( Z  2.66)]
 1  .9961  .0039
Note: it is a very rare event to see an average of ten porosities greater than or
equal to 79.21 when the true mean porosity is 79.0.
But we actually observed an average of 79.21, which suggests that the true
mean porosity, at least for the time period studied, is larger than 79.0.
E. Random Behavior of Means, Variance Unknown
1. The Sample Variance
When the variance, σ2 is known, the Central Limit Theorem suggests that
y
/ n
follows a standard normal distribution if $n$ is big enough.
What would seem to be a logical thing to do when σ2 is unknown?
ESTIMATE IT!
Definition: The Sample Variance.
Let y1, y2, …, yn be a random sample of n observations.
\bigskip
The sample variance, s2, is defined by
1
s 
 ( y  y)
n 1
n
2
i 1
2
i
Note:
• s2 looks like an “average”
In fact, it is the “average” squared deviation from y using n-1 instead of n in
the denominator.
• The reason for using n-1 will be discussed later.
• s2 ≥ 0
The sample standard deviation, s, is
s
2
The computational form of s2 is
n y  ( y )
s 
n(n  1)
2
n
2
n
i 1
i
i 1
2
i
Example: Thicknesses of Silicon Wafers
A major semiconductor manufacturer grinds wafers in batches of 31.
For this particular product, suppose that the target thickness is 244 μm.
A random sample yielded the following results:
240 243 250 253 248
The sample mean, sample variance, and the sample standard deviation are
1
y y
n
1234

 246.8
5
n
i
i 1
n y  ( y )
s 
n(n  1)
5(304662)  (1234)

5(4)
 27.7
2
s  s  27.7  5.263
2
n
2
n
i 1
i
i 1
2
i
2
2. The t Distribution
Question: What distribution does
y
s/ n
follow?
If the data come from a parent distribution which follows a normal distribution,
then
y
s/ n
follows a t distribution with n-1 degrees of freedom
1. The t statistic represents the number of estimated standard errors a given
value for y lies from its mean.
2. The t distribution is shorter, squatter version of the Z.
3. As n gets sufficiently large the tn-1 distribution is well approximated by the Z
distribution.
4. The t statistic is well known to be “robust” to the normality assumption.
In general, we feel comfortable using the t statistic whenever we sample from a
“well-behaved” distribution.
As the sample size get bigger, the parent distribution can be less and less wellbehaved.
Example: Thicknesses of Silicon Wafers, Continued
For this particular product, suppose that the target thickness is 244 μm.
A random sample yielded the following results:
240 243 250 253 248
We already have found
•
y  246.8
• s2 = 27.7
• s = 5.263
Our t statistic is
y
s/ n
246.8  244

 1.19
5.263 / 5
t
This t value suggests that the observed sample mean is quite close to the target value.
F. The Normal Approximation to the Binomial Distribution
Recall the binomial distribution.
• Y represents the number of “successes” in n trials.
• p is the probability of a success on any given trial.
• n is the total number of trials.
• μ = E(Y) = np.
• σ2 = np(1-p) = npq.
It can be shown that as n gets large, the distribution of
a standard normal.
Y  np
np(1  p)
approaches
Bottom line: If n is sufficiently large, the binomial distribution is well approximated
by a normal.
What is sufficiently large?
General Rule of Thumb: n is sufficiently large if
• np > 5, (the expected number of successes), and
• n(1-p) > 5 (the expected number of failures).
There is a slight catch.
Let y0 be an integer.
Consider P(Y = y0).
Remember, Y follows a binomial distribution, which is discrete; therefore,
P(Y = y0) > 0 for 0 ≤ y0 ≤ n.
But P(Y = y0) = 0 for a normal random variable.
What should we do?
Recall my example of my height.
To be 6’1” tall means that someone is between 6’0(1/2)” and 6’1(1/2)” tall.
We shall do the same thing know. (Called a correction factor).
Let Y* be a normally distributed random variable with
• mean np
• variance npq.
Note:
1. Y* has the same mean and the same variance as Y, the original binomial
random variable.
2.
Y  np
is a standard normal random variable.
np(1  p)
We can approximate P(Y = a) by
1
1

P(Y  a)  P a   Y  a  
2
2

*
Similarly,
• P(Y  a )  PY  a  .5
*
•
P(Y  a)  PY  a  .5
*
• P(Y  a )  PY  a  .5
*
• P (Y  a )  PY  a  .5
*
Example: Consider a production line of decorative bricks.
Historically, the probability that any given brick is rejected is 0.01.
Suppose an inspector examines 1000 bricks per day.
What is the probability that she rejects less than 2 bricks?
Let Y = number of defective bricks found.
Note, we seek P(Y < 2).
g
P (Y  2)  P (Y  2  .5)
*
 Y  np
1.5  np 

 P

np(1  p ) 
 np(1  p )
*

1.5  1000(0.01) 

 P Z 
1000(0.01)(.99) 

 P ( Z  2.70)
 .0035
What is the probability that she rejects between 8 and 13 bricks, inclusive?
We seek P(8 ≤ Y ≤ 1)
P (8  Y  13)  P (8  .5  Y  13  .5)
*
 7.5  np
Y  np
13.5  np 

 P


np(1  p )
np(1  p ) 
 np(1  p )
 7.5  10
13.5  10 

 P
Z
10(.99) 
 10(.99)
 P (0.79  Z  1.11)
 .8665  .2148
 .6517
*