Download Example

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Combinatorics
What is Combinatorics ?
Definition: Methodology of arrangements or selecting
of finite set elements according to some rules.
Three basic problems:
z
Existence:
Whether some arrangements is possible.
z
Counting:
Count the number of ways that the corresponding
arrangement can be carried out.
z
Construction: Generate all possible arrangements.
Many problems in probability theory require that we
count the number of ways that a particular event can
occur. (Counting problem)
1
Multiplication Rule
Fact: If the set A has m elements and the set B has n
elements then the set A×B has m⋅n elements.
Multiplication Rule: Suppose that an arrangements or
selection process consists of two sequential parts. The
first part can occur in n ways; the second part in m ways;
and the occurrence in one part does not affect that of the
other part. Then the process can occur in n⋅m ways.
A task is to be carried out in a sequence of r stages.
There are n1 ways to carry out the first stage; for each of these n1
ways, there are n2 ways to carry out the second stage; for each of these n2 ways, there are n3 ways to carry out the third stage, and so forth.
Then the total number of ways in which the entire task can be
accomplished is given by the product N = n1 ⋅ n2 ⋅ ⋅ ⋅ nr..
Counting Subsets
Fact: If the set A has n elements then the number of its
subsets is 2n.
Explanation: The number of subsets is equal to the number of binary
n-strings.
( 00…0 ≡ ∅, . . . , 0…01…0, . . . , 11…1 ≡ A )
n-zeroes
element included in the subset
The number of binary n-strings is equal to the number of elements in
B × B × … × B where B = {0, 1}. According to the multiplication rule,
n-times
this number is equal to 2⋅2⋅…⋅2 = 2n
Example: The subsets of {a,b,c} are: ∅, {a}, {b}, {c}, {a,b}, {a,c}, {b,c}, {a,b,c}.
2
Types of Selections
A selection process can be with or without replacement.
In selection with replacement, the number of choices remains the same
throughout the selection process; any selected object is eligible to be
selected for the second or more time.
In selection without replacement, the number of choices reduces one by
one successively; any selected object is no longer eligible for reselection.
The order of objects being selected may or may not matter in a selection
process. If the order of selection matters, selecting A before B is different
from selecting B before A. However, if the order does not matter, i.e., all
you care is which two objects that you have selected, the collections {A,
B} and {B, A} are exactly the same.
Consider selecting two objects from {A, B}.
Order does not matter
Order matters
With Replacement
{AA, AB, BB}
{AA, AB, BA, BB}
Without Replacement
{AB}
{AB, BA}
Variations (order matters)
The variation of n distinguishable objects taken k at a time is the number
of arrangements (selections) in which the order of (the selection of)
objects matters.
The number of variation
n!
Vn,k = n ⋅ (n − 1) ⋅ (n − 2)L(n − k + 1) =
without replacement:
(n − k )!
There are n ways to carry out the first object; for each of these n ways, there are
n-1 ways to carry out the second object; for each of these n-1 ways, there are n2 ways to carry out the third object, and so forth.
The number of variations
with replacement:
V n ,k = n ⋅ n ⋅ n L n = n k
There are n ways to carry out the first object; for each of these n ways, there are
n ways to carry out the second object; for each of these n ways, there are n
ways to carry out the third object, and so forth.
3
Variations (examples)
z
How many 4-digits number with different digits exists?
NDigits = 9 ⋅ V9,3 = 9 ⋅ 9 ⋅ 8 ⋅ 7 = 4536
z
How many 4-digits number exists?
NDigits = 9 ⋅ V10,3 = 9 ⋅ 10 ⋅ 10 ⋅ 10 = 9000
z
In poker, each player (suppose 1 plyer) is randomly distributed 5
cards. Find the number of possible hands.
NHands = V52,5 = 52 ⋅ 51⋅ 50 ⋅ 49 ⋅ 48 = 311875200
Permutations (order matters)
The permutation of n distinguishable objects is the number of
arrangements (selections) in which the order of objects matters
(permutation = variation without replacement of order n, k= n).
The number of
permutations:
Pn = n ⋅ (n − 1) ⋅ (n − 2)L1 = n!
There are n ways to carry out the first object; for each of these n ways, there are
n-1 ways to carry out the second object; for each of these n-1 ways, there are n2 ways to carry out the third object, and so forth.
Permutations of Non-Distinct Objects: Suppose that there are n objects
of k types where objects of the same types are indistinguishable from
each other. Further suppose that there are ni objects of ith type; Σi ni = n.
The number of permutations
of non-distinct objects:
Pn1,n2 ,K,nk =
n!
n1! n2!L nk !
4
Permutations(examples)
z
In how many ways can be arranged 2 apples, 3 oranges and
3 peachs.
8!
P2,3,3 =
= 80
2!3!3!
z
How many permutations can be made of the letters of the
word missisipi.
9!
P1,4,3,1 =
= 2520
4!3!
Combinations (order does not matter)
The combination of n distinguishable objects taken k at a time is the
number of arrangements in which the order of selection does not matter.
The number of
combinations:
Cn,k =
Vn,k
k!
=
n!
n
= ⎛⎜ ⎞⎟
k! (n − k )! ⎝ k ⎠
Because the order does not metter the number is obtained by dividing
the number of variation with the number of permutation to eliminate all
possible different orders.
The number of combinations
with replacement:
n + k − 1⎞
Cn,k = ⎛⎜
⎟
⎝ k
⎠
Note that in combinations with replacement n could be less than k.
5
Combinations (examples)
Objects {a, b, c}.
Combination of order 2: {ab, ac, bc}.
3
( 3 = ⎛⎜ ⎞⎟ )
⎝ 2⎠
Combination with replacement of order 2: {aa, ab, ac, bb, bc, cc}. ( 6 = ⎛⎜ 3 + 2 − 1⎞⎟ )
⎝
z
From a group of 7 mans and 4 women we choose 6 persons from
which at least two should be women. Count the number of
possibilities.
NPossibilities = C7,4 ⋅ C 4,2 + C7,3 ⋅ C 4,3 + C7,2 ⋅ C 4,4 =
7 4
7 4
7 4
= ⎛⎜ ⎞⎟⎛⎜ ⎞⎟ + ⎛⎜ ⎞⎟⎛⎜ ⎞⎟ + ⎛⎜ ⎞⎟⎛⎜ ⎞⎟ = 35 ⋅ 6 + 35 ⋅ 4 + 21⋅ 1 = 371
⎝ 4 ⎠⎝ 2 ⎠ ⎝ 3 ⎠⎝ 3 ⎠ ⎝ 2 ⎠⎝ 4 ⎠
z
A flower-shop has 5 kinds of flowers. How many bouquets of 11
flowers could be made.
15 ⋅ 14 ⋅ 13 ⋅ 12
15
NBouquets = C5,11 = C5 +11−1,11 = ⎛⎜ ⎞⎟ =
= 1365
11
1⋅ 2 ⋅ 3 ⋅ 4
⎝ ⎠
2
⎠
Examples
There are four suits (spade, heart, club, and diamond) in an ordinary
deck of cards. Each suit is one of 13 denominations (ace, 2, 3, …, 10,
jack, queen, king). By convention, the ace is regarded as either one or
14; jack as 11, queen as 12; king as 13.
In poker, each player is randomly distributed 5 cards. The five cards form:
z a royal flush if the five cards are 10, J, Q, K, A of the same suit,
za
straight flush if the five cards are of the same suit and of consecutive
numbers but not a royal flush,
z a four-of-a-kind if there are four cards of the same denomination,
z a full house if there are three cards of one denomination and two cards
of another,
6
Examples
za
flush if the five cards are of the same suit but not a royal flush, nor a
straight flush,
z a straight if the five cards are of consecutive numbers but not the
same suit,
z a three-of-a-kind if there are three cards of one denomination and one
card each of two other denominations,
z a two-pairs if there are two cards of one denomination, two cards of
another denomination, and one card of a third denomination,
z a one pair if there are two cards of one denomination and one card
each of from three other denomination, and
z a bust if the cards are of different denominations.
Find the number of hands in each of the above cases.
End of Chapter
Thank you for your attention!
7
II
Probability
What is Probability ?
Since pre-historic times, Mankind has been aware of
Deterministic phenomena
z
z
z
z
z
daily sunrises and sunsets
tides at sea shores
phases of the moon
seasonal changes in weather
annual flooding of the Nile
Mankind has also Noticed
Random phenomena
results of coin tosses
z results of rolling dice
z results of horse races
Legends and folk tales from all over the world refer to dice
games and gambling
z
1
What is Probability ?
Modern mankind also loves to gamble
z
z
z
State-run lotteries
Casinos
Betting Parlors, etc
Probabilistic notions are commonplace in everyday
language usage. We use words such as
z probable/improbable;
possible/impossible
z certain/uncertain; likely/unlikely
Phrases such as:
“there is a 50-50 chance”
“the odds are 7 to 4 against”
“the probability of precipitation is 20%”
are understood? by most people
What is Probability ?
What is the probability that a coin comes down Heads when
it is tossed?
Almost everyone answers 1/2. Why 1/2 ?
z “Because there are two sides to the coin”
z “Because when tossed repeatedly, the coin will turn up Heads half the
time”
This is called the classical approach to probability
Justification: Symmetry principle;
Principle of indifference (or insufficient reason)
If we toss two coins, are there three outcomes or four outcomes?
z {0 Heads, 1 Head, 2 Heads}?
z {(T,T), (T,H), (H,T), (H,H)}?
z
Note that 2 Heads has probability 1/3 or 1/4 depending on the choice
2
What is Probability ?
Suppose multiple tosses have resulted in 50% Heads.
Setting P(Head) = 1/2 is the relative frequency approach to
probability.
z
z
z
z
z
z
z
If an outcome x occurs NX times on N trials, its relative frequency is
NX /N and we define its probability P(x) to be NX /N
Does there exist a probability of Heads for a mint-new untossed coin?
Or do probabilities come into existence only after multiple tosses?
How large should N be?
Are probabilities re-defined after each toss?
Many assertions about probability are essentially statements of beliefs
A fair coin is one for which P(Heads) = 1/2 but how do we know
whether a given coin is fair?
Symmetry of the physical object is a belief; That further tosses of a
coin for which P(Heads) = 1/2 will result in 50% Heads is a belief
What is Probability ?
Consider a dice game that played an important role in the historical
development of probability.
The famous letters between Pascal and Fermat, which many believe
started a serious study of probability, were instigated by a request for
help from a French nobleman and gambler, Chevalier de Méré.
It is said that de Méré had been betting that, in four rolls of a dice, at
least one six would turn up. He was winning consistently and, to get
more people to play, he changed the game to bet that, in 24 rolls of two
dice, a pair of sixes would turn up. It is claimed that de Méré lost with
24 and felt that 25 rolls were necessary to make the game favorable.
Later we shell compute the following probabilities:
P(“at least one 6 of 4 rolls”) = 0.518
P(“at least one 2×6 of 24 rolls”) = 0.491
P(“at least one 2×6 of 25 rolls”) = 0.506
3
Experiments
An experiment that can result in different outcomes, even
though it is repeated in the same manner every time, is
called a random experiment.
The set of all possible outcomes of a random experiment is
called sample space and each outcome is called a trial.
The sample space is denoted by Ω and a trial by ω.
Example: Tossing a coin: Ω = { H, T }
Example: Rolling a dice: Ω = { 1, 2, 3, 4, 5, 6 }
Example: Phone calls in police (in 24 hours): Ω = { 0, 1, 2, 3, … } = N
Example: Measuring a noise voltage: Ω = { x : –1 ≤ x ≤ 1 }
Example: Brown moving of particles: Ω = { (x(t), y(t), z(t)), t∈[0, T] }
Events
A subset of Ω is called an event
Example: A = {2, 4, 6} and B = {2, 3, 5} are said to be events defined
on the sample space Ω = {1, 2, 3, 4, 5, 6}
“events defined on the sample space” is merely a probabilist’s way of
saying “subsets of the sample space”
AC = {1, 3, 5} and BC = {1, 4, 6} also are events defined on Ω.
A sample space Ω of n elements has 2n different events (subsets).
Two special events:
z Ω can be regarded as a subset of Ω
z On any trial, the event Ω always occurs
z The event Ω is called the certain event or the sure event
z ∅, the empty set, is also a subset of Ω
z On any trial, the event ∅ never occurs
z The event ∅ is called the null event or the impossible event
4
Experiments and Events
An event containing a single outcome (trial) is called an
elementary event or singleton event.
Elementary events can not happen simultaneously.
Example: Rolling 2 dices (shows that there could have more sample spaces):
Ω1 = { (x,y), x,y = 1, 2, …, 6 }
Ω2 = { 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 } (sum)
Ω3 = { “even sum”, “odd sum” }
Ω3 = { “same numbers”, “different numbers”, “sum 7” }
Ω1, Ω2, Ω2 are elementary event spaces.
Ω4 is not an elementary event space because events “different
numbers” and “sum 7” can happen simultaneously.
Ω1 is most informative elementary event space.
Event Operations
A ∩ B (A⋅B)
A∪B
AC
A\B (A−B)
Event A implies event B (A⊆B) iff when occurs A then occurs B.
Example: Rolling 2 dices, Ω1.
A = { (x,y), x=y } (same numbers); B = { (x,y), x+y=2k } (even sum)
The following operation holds:
A ∩ B = A, A ∪ B = B, AC = { (x,y), x≠y }, A⊆B
B\A = { (1,3), (1,5), (2,4), (2,6), (3,1), (3,5), (4,2), (4,6), (5,1), (5,3), (6,2), (6,4) }
5
Probability Axioms
Probabilities are numbers assigned to events that satisfy the
following rules:
Axiom I: P(A) ≥ 0 for all events A
Axiom II: P(Ω) = 1
z Axiom III: If events A and B are disjoint, then P(A ∪ B) = P(A) + P(B)
z
z
Consequences of the Axioms:
P(∅) = 0;
0 ≤ P(A) ≤ 1;
P(AC) = 1 – P(A);
If A⊆B then P(A) ≤ P(B);
P(A ∪ B) = P(A) + P(B) – P(A ∩ B);
Discrete Probability
For elementary sample space Ω with n elements ω1, ω2, …,
ωn, the probabilities of events depend on the probabilities of
the outcomes:
Classical approach: Each outcome has probability 1/n; P(ωi) = 1/n for all i
P(A) = |A|/n, where |A| = # of elements in A
Special cases: P(Ω) = n/n = 1, P(∅) = 0/n = 0
Nonclassical approach: The n outcomes have probabilities p1, p2, …, pn
where pi ≥ 0 and ∑pi = 1
P(A) = sum of the pi for all members of A
Special cases: P(∅) = 0 as before
P(Ω) = p1 + p2 + … + pn = 1
6
Examples (1)
What is the probability that at rolling of 2 dices we will get:
a) sum 7, b) different numbers, c) sum greater than 8.
P(" sum 7" ) =
6
30
9
= 0.1667, P(" different numbers " ) =
= 0.8333, P(" sum > 7" ) =
= 0.25
36
36
36
We distribute 3 plying cards. What is the probability to get:
a) one ace, b) at least one ace, c) at least one club, d) black jack
⎛ 4 ⎞⎛ 48 ⎞ + ⎛ 4 ⎞⎛ 48 ⎞ + ⎛ 4 ⎞⎛ 48 ⎞
⎛ 4 ⎞⎛ 48 ⎞
⎜ 1 ⎟⎜ 2 ⎟
⎜ 1 ⎟⎜ 2 ⎟ ⎜ 2 ⎟⎜ 1 ⎟ ⎜ 3 ⎟⎜ 0 ⎟
P(" one ace" ) = ⎝ ⎠⎝ ⎠ = 0.204, P(" at least one ace" ) = ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠ = 0.218
⎛ 52 ⎞
⎛ 52 ⎞
⎜3⎟
⎜3⎟
⎝ ⎠
⎝ ⎠
Better approach :
⎛ 4 ⎞⎛ 48 ⎞
⎜ 0 ⎟⎜ 3 ⎟
P(" at least one ace" ) = 1 − P(" no ace" ) = 1 − ⎝ ⎠⎝ ⎠ = 0.218
⎛ 52 ⎞
⎜3⎟
⎝ ⎠
⎛ 4 ⎞⎛ 48 ⎞
⎛13 ⎞⎛ 39 ⎞
⎜ 1 ⎟⎜ 2 ⎟
⎜ 0 ⎟⎜ 3 ⎟
⎝
⎠
⎝
⎠
P(" at least one c lub" ) = 1 −
= 0.587, P(" black jack " ) = ⎝ ⎠⎝ ⎠ = 0.048
⎛ 52 ⎞
⎛ 52 ⎞
⎜3⎟
⎜3⎟
⎝ ⎠
⎝ ⎠
Examples (2)
How many people do we need to have in a room to make it a favorable
bet (probability of success greater than 1/2) that two people in the room
will have the same birthday? Since there are 365 possible birthdays, it is
tempting to guess that we would need about 1/2 this number, or 183. You would
surely win this bet. In fact, the number required for a favorable bet is only 23.
k (people) Probability
P(" have the same birthday " ) =
1 − P(" do not have the same birthday " ) =
1 −
365 ⋅ 364 ⋅ 363 ⋅ K ⋅ (365 − k + 1)
365 k
10
20
23
30
40
50
0.1169
0.4115
0.5073
0.7064
0.8912
0.9704
Compute the probabilities that in four rolls of a dice, at least one six would turn up
and that in 24 rolls of two dice, a pair of sixes would turn up. (Chevalier de Méré)
P(" at least one 6 of 4 rolls" ) = 1 − P(" no one 6 of 4 rolls" ) = 1 −
54
64
= 0.518
P(" at least one 2 × 6 of 24 rolls" ) = 1 − P(" no one 2 × 6 of 24 rolls" ) = 1 −
35 24
36 24
= 0.491
7
Estimates of Probabilities (1)
z
Based on past experimental results, we can use the observed
relative frequency as an estimate of the probability of an event
If an experiment is repeated N times, and event A occurs NA
times the relative frequency of the event A is W(A) = NA/N.
z
Estimates are usually reasonably good if the number of trials was large
On N trials, the relative frequency that we observe might easily differ
from the true probability by as much as N–1/2 or more
z
z
z
z
z
Suppose that a fair coin is tossed a million times
Is there a logical reason why the coin will not turn up Heads each and
every time?
No, there is no logical reason why it couldn’t, but it is very unlikely
Yes, if the coin is fair, there is no way that it can turn up Heads a
million times in a row
Estimates of Probabilities (2)
If we toss two coins, are there three outcomes or four outcomes?
z { 0 Heads, 1 Head, 2 Heads } ?
z { (T,T), (T,H), (H,T), (H,H) } ?
Note that 2 Heads has probability 1/3 or 1/4 depending on the choice
(if outcomes are considered with same probabilities ???)
Only the observed relative frequency gives us a solution what
probability to choose. So, initial estimates of outcome probabilities
are based on observed relative frequencies.
All probability theory is deduced to methods of calculation
of probability of events composed of outcomes related to
an experiment.
In case of two coins, the relative frequency is W((H,H)) ≈ 1/4.
8
Estimates of Probabilities (3)
z
z
z
z
z
z
z
A lot of effort was expended in trying to define probability as the limit
of the relative frequency
P(A) = lim NA/N
N→∞
Unfortunately, the limit does not exist in a mathematical sense
Physically we will only observe a finite length prologue of the
sequence of trials
The nonclassical approach cannot be used at all for uncountable
infinite sample spaces
Example: Pick a random number between 0 and 1. Ω = { x | 0 < x < 1 }
Consider a million random numbers obtained from rand(). A particular
outcome, say 0.703546789, will either not have occurred in these
million trials, or it will have occurred just once.
P{0.703546789} = 10–6 or 0
Estimates of Probabilities (4)
z
z
z
z
z
z
z
z
The relative frequency estimate seems to be converging to 0 as the
number of trials increases!
The only model that works for uncountable infinite sample spaces is
for each outcome to have probability 0
But, on each trial, some outcome occurs, doesn’t it?
So where are the probabilities?
For rand(), P{a < outcome < b} = b–a
The nonzero probabilities are assigned to the intervals of the line, not
to outcomes!
In the physical sciences and engineering, the real numbers are a model for many phenomena that are discrete at the microscopic level.
This usually causes no problems and the model usually gives the
correct answers
We all understand that V = 1.235 volts really means 1.2345 ≤ V ≤
1.2355 volts
9
God made the integers
Kronecker: God made the integers; all else is the work of man
Human beings usually choose rational numbers when asked for a
number in (0,1)
A physical measurement made with an instrument will yield a rational
number
rand() returns “real numbers” that are actually rational numbers
All this is because of finite precision
z
z
z
z
z
Do real numbers exist?
The real number line is a mathematical construction that models the
real world very well indeed
If the volume electrical charge density is ρ, the charge in a volume Δv
is just ρΔv
For Δv very small, ρΔv is smaller than the charge of an electron, so the
model cannot be right for small volumes (or densities)! (Paradoxes)
But it is convenient !!!
z
z
z
z
What about P(arbitrary subset)?
z
z
z
z
z
z
If every outcome is an event of probability zero, then isn’t it true that
any event A must also have probability zero?
P(A) = sum of the probabilities of all the outcomes that comprise A
=0+0+…=0?
No, the above is a mis-application of Axiom III (which applies to
countable unions only)
Since each outcome has probability zero, a countable event, that is,
an event that has a countable number of outcomes, also has
probability zero (by Axiom III)
Axiom III does not say that the probability of an uncountable event is
the sum of the probabilities of the outcomes
For uncountable infinite sample spaces, a consistent
probability assignment to all the subsets of Ω is not
possible
10
Asking the right question
z
z
z
z
z
z
z
z
z
The nonzero probabilities are assigned to the intervals of the line, not
to outcomes!
In most physical applications, the question “Does x =
0.213482774099070267623…?” is meaningless
If x were 0.213482774099070267624…instead, the airplane will still
fly, the bridge will still stand, the modem will still connect
In most instances, we are satisfied if x is in some specified range
(design specs)
“Does x ∈ (a,b)?” is the right question!
Example. Choose a random number between 0 and 1
P{a < outcome < b} = b–a; P{0.4 < outcome < 0.6} = 0.2
N calls to rand() give N numbers, roughly 20% of which are in the
interval (0.4, 0.6)
At most one (and most likely none!) of these will be 0.57689231
Geometric Probability
Let us Ω be infinite and uncountable representing some geometrical object in R (axis), R2 (plane), R3 (space), R4 (4-dimensional space), etc. Let us S ⊆ Ω be a random event.
Then, probability of S is the number
P(S) =
m(S)
,
m( Ω)
⎛ where m(A) is a measure of the set A
⎞
⎜
⎟
1
2
3
⎝ length in R , area in R , volume in R , etc . ⎠
Example. Two friends arrange meeting between
10 and 11 clock. What is the probability that they
will meet each other if one will wait other at most
20 minutes.
Ω = { (x,y), x,y∈[0,60] }
60
40
|x - y| < 20
20
S: |X – Y| < 20
P = (602 – 402)/602 = 5/9 = 0.5556
20
40
60
11
Examples
Example. What is the probability that the
equitation x2 + ax + b = 0 will have real
solutions for random chosen a,b∈[0,1].
b
1
Ω = { (a,b), a,b∈[0,1] }
S: a2 > 4b ⇒ b < a2/4
a2/4
P = ∫0,1(a2/4)da = 1/12 = 0.083
1 a
Example. One stick is randomly broken on three
parts. What is the probability that the parts can
form triangle.
Ω = { (x,y), x+y<l }
x
y
S: x < y + (l-x-y) ⇒ x < l/2
y < x + (l-x-y) ⇒ y < l/2
(l-x-y) < x + y ⇒ x+y < l/2
P = 0.5(l/2)(l/2)/(0.5⋅l⋅l) = ¼ = 0.25
l
l-x-y
l/2
l/2
l
Conditional Probability
z
The experiment has been performed and we know that the event
A occurred, that is, the outcome is some member of A
z
Question: What are the chances that B occurred? in view of the new
knowledge that event A is known to have occurred?
One (stupid?) answer to this question is that the chances that B
occurred are still what they always were, viz. P(B), But:
z
z
z
Left diagram: AB = ∅. Obviously, if A occurred, B cannot have occurred
Right diagram: AB = A. Obviously, if A occurred, B must also have
occurred
12
Definition
z
The conditional probability of B given A is denoted by P(B|A)
z
Read this as “the probability of B given A” or “the probability of B
conditioned on A”
A is called the conditioning event
z
Definition: If P(A) > 0, P(B|A) is defined as P(B | A) =
z
z
P(AB)
P(A)
Left diagram: AB = ∅. Obviously, if A occurred, B cannot have
occurred P(B|A) = P(AB)/P(A) = 0
Right diagram: AB = A. Obviously, if A occurred, B must also have
occurred P(B|A) = P(AB)/P(A) = P(A)/P(A) = 1
Motivation
The definition of conditional probability is motivated by considerations
arising from the relative frequency viewpoint
z Suppose that N independent trials of the experiment have been
performed
z Let NA denote the number of trials on which event A occurred
z Let NB denote the number of trials on which event B occurred
z Event AB occurred on NAB trials
z Consider only those NA trials on which A occurred and ignore the rest
B occurred on NAB trials out of the NA trials on which A occurred, so
the relative frequency of B on those NA trials is NAB/NA
N AB
W(AB)
N AB
W(B | A) =
= N =
NA
W(A)
NA
N
conditional probability = relative frequency on a restricted set of trials
13
Examples
Example. Two fair dice are rolled.
What is the probability that the sum of the two faces is 6 given that the
dice are showing different faces?
What is the probability that the sum of the two faces is 7 given that the
dice are showing different faces?
What is the probability that the first dice is showing a 6 given that the
dice are showing different faces?
A = “different faces”, P(A) = 30/36
B = “sum of the two faces is 6”, AB = { (1,5), (5,1) ,(2,4), (4,2) }
C = “sum of the two faces is 7”, AC = { (1,6), (6,1) ,(2,5), (5,2), (3,4), (4,3) }
D = first dice is showing a 6”,
AD= { (6,1), (6,2) ,(6,3), (6,4), (6,5) }
P(B|A) = (4/36)/(30/36) = 2/15 < P(B) = 5/36
P(C|A) = (6/36)/(30/36) = 1/5 > P(C) = 1/6
P(D|A) = (5/36)/(30/36) = 1/6 = P(D) = 1/6
Axioms
Conditional probabilities are a probability measure, that is,
they satisfy the axioms of probability theory
Axiom I: 0 ≤ P(B|A) ≤ 1 for all events B.
Since AB ⊆ A, 0 ≤ P(AB) ≤ P(A) ⇒ 0 ≤ P(B|A) = P(AB)/P(A) ≤ 1
z Axiom II: P(W|A) = 1
Since AW = A, P(AW) = P(A) ⇒ P(W|A) = P(AW)/P(A) = 1
z Similarly for Axiom III
z
An expression such as P((B ∪ C)|A) is commonly written as P(B ∪ C|A)
Beginners’ mistake: If B and C are disjoint, they write
P(B ∪ C|A) = P(B) + P(C|A)
z NOT!
14
Some Rules
P(BC|A) = 1 – P(B|A)
z If B ⊆ C, then P(B|A) ≤ P(C|A)
z If BC = ∅, then P((B ∪ C)|A) = P(B|A) + P(C|A)
z More generally, P((B ∪ C)|A) = P(B|A) + P(C|A) – P(BC|A)
z Even if A,B,C, and D are disjoint, P(B ∪ C|A ∪ D) ≠ P(B) + P(C|A) +P(D)
z
OK, so you can update your probabilities to conditional probabilities if
you know that event A occurred
z Is that all there is to it?
z Is the notion of conditional probability just a one-trick pony?
z Surely life holds more than that?
Actually, conditional probabilities are fundamental
tools in probabilistic analyses
Chain Rule
z
P(B|A) = P(AB)/P(A)
and symmetrically
P(A|B) = P(AB)/P(B)
z
P(AB) = P(B|A)P(A)
=
P(A|B)P(B) = P(AB)
More generally, P(ABCD…) = P(A)P(B|A)P(C|AB)P(D|ABC)…
z
Product of first two terms is P(AB), P(C|AB)P(AB) = P(ABC), so that
the product of the first three terms is P(ABC), and so on …
z
Every probability result also applies to conditional probabilities. The
chain rule applies to computation of conditional probabilities by
conditioning everything on the given event H (say)
P(ABCD… |H) = P(A|H)P(B|AH)P(C|ABH)P(D|ABCH)…
15
Total Probability
From A = AB ∪ ABC ⇒ P(AB) + P(ABC) = P(A)
On the other hand P(AB) = P(A|B)P(B) and P(ABC) = P(A|BC)P(BC)
P(A) = P(A|B)P(B) + P(A|BC)P(BC)
and symmetrically
P(B) = P(B|A)P(A) + P(B|AC)P(AC)
z
z
z
z
z
This result allows us to find unconditional probabilities from conditional
probabilities
It is a fundamentally important result
It is also very simple (uses horse sense)
This fundamental result is called the theorem of total probability
The probability of the event A is the weighted average of the
probabilities of A conditioned on B and on BC
Example
Example. Box I has 3 green and 2 red balls, while Box II has 2 green
and 2 red balls. A ball is drawn at random from Box I and transferred
to Box II. Then, a ball is drawn at random from Box II. What is the proability that the ball drawn from Box II is green? Note that the color of
the ball transferred from Box I to Box II is not known.
Box I
Box II
After the transfer, Box II has 5 balls in it
G = event ball drawn from Box II is green, A = event ball transferred is red
P(G|A) = 2/5, P(G|AC) = 3/5, P(A) = 2/5
P(G) = P(G|A)P(A) + P(G|AC)P(AC) = (2/5)(2/5) + (3/5)(3/5) = 13/25
16
Total Probability
Given a partition A1, A2 , …, An of the sample space (A1∩A2 = ∅
and A1 ∪ A2 ∪ … ∪ An = Ω), then
P(B) = P(B|A1)P(A1) + P(B|A2)P(A2)+ … + P(B|An)P(An)
The theorem as presented previously was the case n = 2 of this more
general result.
If P(B|Aj) is the smallest of the P(B|Ai), then replacing the P(B|Ai) by
P(B|Aj) gives
P(B) ≤ P(B|Aj)•[ P(A1) + P(A2) + … + P(An) ] = P(B|Aj)
If P(B|Ak) is the largest of the P(B|Ai), then replacing the P(B|Ai) by
P(B|Aj) gives
P(B) ≥ P(B|Ak)•[ P(A1) + P(A2) + … + P(An) ] = P(B|Ak)
Conclusion: minP(B|Aj) ≤ P(B) ≤ maxP(B|Ai)
j
i
Example
Example. Box I has 2 green and 4 red balls, while Box II has 4 green
and 2 red balls. 1 ball from Box I and 2 balls from Box II are transferred
to Box III. Then, a ball is drawn at random from Box III. What is the probability that the ball drawn from Box III is green?
B = ball drawn from Box III is green
A0 = Box III has 0 green ball, P(A0) = unimportant because P(B|A0) = 0
A1 = Box III has 1 green ball, P(A1) = (2/6)(C2,2/C6,2) + (4/6)(C4,1C2,1/C6,2) =
(2/6)(1/15) + (4/6)(8/15) = 17/45.
P(BI:1g BII:2r)
p(BI:1r BII:1g1r)
A2 = Box III has 2 green ball, P(A2) = (2/6)(C4,1C2,1/C6,2) + (4/6)(C4,2C2,0/C6,2) =
P(BI:1g BII:1g1r)
p(BI:1r BII:2g)
(2/6)(8/15) + (4/6)(6/15) = 20/45
A3 = Box III has 3 green ball, P(A3) = (2/6)(C4,2C2,0/C6,2) = (2/6)(6/15) = 6/45
P(BI:1g BII:2g)
P(B) = P(B|A1)P(A1) + P(B|A2)P(A2)+ P(B|A3)P(A3) =
= (1/3)(17/45) + (2/3)(20/45) + (3/3)(6/45) = 5/9
17
Bayes’ formula
z
Given that event A of probability P(A) > 0 occurred, the conditional
probability of B given A is denoted by P(B|A) and defined as
P(B|A) = P(AB)/P(A)
z
What is P(A|B)?
P(A | B) =
z
z
z
P(AB) P(B | A) P(A)
=
P(B)
P(B)
Simplest version of Bayes’ formula
Bayes’ formula P(A|B) = P(B|A)P(A)/P(B) is also called Bayes’
theorem, or Bayes’ lemma, or often (mistakenly) Bayes’ rule
Bayes’ rule refers to a methodology for decision-making that is an
extremely controversial topic among statisticians
Bayes’ formula
z
z
When P(B) is obtained from P(B|Ak)’s via the more
general version of the theorem of total probability, the
more general total probability appears in the denominator
The numerator is still one of the terms in the denominator
P(A k | B) =
P(B | A k ) P(A k )
=
P(B)
=
P(B | A k ) P(A k )
P(B | A1) P(A1) + P(B | A 2 ) P(A 2 ) + K + P(B | A n ) P(A n )
18
Example
Example. Box I has 2 green and 4 red balls, while Box II has 4 green
and 2 red balls. 1 ball from Box I and 2 balls from Box II are transferred
to Box III. Then, a ball is drawn at random from Box III. What is the probability that the ball drawn from Box III is green?
B = ball drawn from Box III is green
A0 = Box III has 0 green ball, P(A0) = unimportant because P(B|A0) = 0
A1 = Box III has 1 green ball, P(A1) = (2/6)(C2,2/C6,2) + (4/6)(C4,1C2,1/C6,2) =
(2/6)(1/15) + (4/6)(8/15) = 17/45.
P(BI:1g BII:2r)
p(BI:1r BII:1g1r)
A2 = Box III has 2 green ball, P(A2) = (2/6)(C4,1C2,1/C6,2) + (4/6)(C4,2C2,0/C6,2) =
P(BI:1g BII:1g1r)
p(BI:1r BII:2g)
(2/6)(8/15) + (4/6)(6/15) = 20/45
A3 = Box III has 3 green ball, P(A3) = (2/6)(C4,2C2,0/C6,2) = (2/6)(6/15) = 6/45
P(BI:1g BII:2g)
P(B) = P(B|A1)P(A1) + P(B|A2)P(A2)+ P(B|A3)P(A3) =
= (1/3)(17/45) + (2/3)(20/45) + (3/3)(6/45) = 5/9
Independence
z
z
z
z
z
z
z
Repeated independent trials. The outcome of any trial of the
experiment does not influence or affect the outcome of any
other trial
The trials are said to be physically independent
Physical independence is a belief
It cannot be proved that the trials are independent; we can only
believe
The belief in independence is reflected in the assignment of
probabilities to the events of the compound experiment
If the trials are (believed to be) independent, then we set
P(A, B, C, AC, …) = P(A)P(B)P(C)P(AC)…
Both A and AC cannot occur on the same trial of the simple
experiment: here they are occurring on different
subexperiments
19
Independence
z
Definition: Events A and B defined on an experiment are said to
be (stochastically) mutually independent if
P(A ∩ B) = P(A)P(B)
z
z
z
z
Sometimes people say “A is independent of B” instead, but independence is mutual: A is independent of B if and only if B is
independent of A
If we believe that events A and B are physically independent,
then we insist that this equality holds
Physical independence is, in essence, a property of the events
themselves. We believe that events A and B are physically independent and express this independence via P(AB)=P(A)P(B)
Stochastic independence is a property of the probability
measure and does not necessarily mean that the events are
physically independent
Independence
z
If A and B are mutually independent events, then
P(B|A) = P(AB)/P(A) = P(A)P(B)/P(A) = P(B) and
P(A|B) = P(AB)/P(B) = P(A)
z
The conditional probability of B given A is the same as the unconditional probability! Knowing that A occurred does not cause any
“updating” of the chances of B.
If A and B are mutually independent events, then P(AB) = P(A)P(B).
If A and B are mutually exclusive events, then P(AB) = 0
For mutually exclusive events, P(B |A) = 0. Knowing that A occurred
guarantees that B did not occur! Thus, A and B cannot be mutually
independent as well.
z
z
z
Many people (and textbook authors!) feel that P(B|A) = P(B) is
a much more natural definition of the notion of independence
z
z
z
“B is independent of A if P(B|A) = P(B)”
A and B seem to have different roles and mutuality of independence is
not obvious
Assumes that P(A) > 0
20
Independence
z
z
z
Physical independence (which is a belief, remember?) of A
and B implies stochastic independence — we insist that we
must have P(AB) = P(A)P(B)
But, if we do not have any reason to believe that A and B are
physically independent, and our calculations reveal that P(AB)
= P(A)P(B), we should not automatically assume that A and B
are also physically independent
If A and B are mutually independent events, then
P(AB) = P(A)P(B)
This is equivalent to each of the following
z P(ABC) = P(A)P(BC)
z P(ACB) = P(AC)P(B)
z P(ACBC) = P(AC)P(BC)
Examples (1)
Playing cards. Take a card from standard deck of 52 cards. Are
the following events independent: A = “taken ace”, B = “taken
heart” ?
P(A) = 4/52; P(B) = 13/52;
AB = “taken heart ace”; P(AB) = 1/52
P(AB) = 1/52 = (4/52) (13/52) = P(A)P(B) ⇒ A and B are independent
Let us include one joker in the deck.
P(A) = 4/53; P(B) = 13/53;
AB = “taken heart ace”; P(AB) = 1/53
P(AB) = 1/53 ≠ (4/53) (13/53) = P(A)P(B) ⇒ A and B are depended
A minor change in the probability space destroyed the
independence of A and B!
Are A and B physically independent ?
21
Examples (2)
Exclusive-OR gates. Let A and B respectively denote the events that inputs
#1 and #2 of an Exclusive-OR gate are logical 1
Assume that A and B are physically independent (hence they are
stochastically independent) events. Assume that P(A) = P(B) = 0.5
Let C denote the event that the output of the Exclusive-OR gate is logical 1
C = A⊕B = ABC ∪ ACB; P(A) = P(B) = 0.5;
P(C) = P(ABC) + P(ACB) = P(A)P(BC) + P(AC)P(B) = 0.5⋅0.5 + 0.5⋅0.5 = 0.5
Are A and C independent events?
P(AC) = P(A(ABC∪ACB)) = P(ABC) = P(A)P(BC) = 0.5⋅0.5 = 0.25 = P(A)P(C)
Is the output of the XOR gate really independent of the input?
z The output is stochastically independent of the input
z The output is physically dependent on the input
z Physical independence (such as A and B being independent) is a belief
z Stochastic independence is an artifact of the probability measure
z
Examples (2)
Exclusive-OR gates. Let A and B respectively denote the events that
inputs #1 and #2 of an Exclusive-OR gate are logical 1
Assume that A and B are physically independent (hence they are
stochastically independent) events.
C = A⊕B = ABC ∪ ACB; P(A) = P(B) = 0.500001;
P(C) = P(A)P(BC) + P(AC)P(B) = 2⋅0.500001⋅0.499999 = 0.49999999998
P(AC) = P(ABC) = P(A)P(BC) = 0.500001⋅0.499999 = 0.2499999999 ≠
P(A)P(C) = 0.250000499998…
zA
minor change in the probabilities of A and B from P(A) = P(B) = 0.5
to P(A) = P(B) = 0.500001 destroyed the independence of A and C!
z It would be hard to distinguish between the two cases via
experimentation
z The occurrence of stochastic independence of A and C does not imply
that A and C are physically independent
z The output of an XOR gate does depend on its input
22
Series Independent Trials
The k trials on which A occurs in a series of n trials can be specified by stating the subset (of size k) of {1, 2, 3, … , n} on
which A occurred
z How many such subsets are there?
z The probability p that A occurs on k trials and does not occur on
the other n–k trials is
n
P{A occurs on k trials out of n } = ⎛⎜ ⎞⎟ p k (1 − p )n − k
⎝k ⎠
z Generalization: The probability that A occurs on na trials, B
occurs on nb trials, C occurs on nc trials, . . . in n trials is
z
P{A occurs on na , B occurs on nb , C occurs on nc , trials out of n } =
n!
=
P(A)na P(B)nb P(C)nc K
na ! nb ! nc ! K
Series Independent Trials
z
The most probably number of occurring an event A in a series
of n trials is:
[ n⋅ P(A) – (1–P(A)) ] where [x] denotes whole part of x
Example. What is more probably: a chases match between equal
players to finish 3:2 or 5:5.
⎛ ⎞ 3
2
P(“3 : 2”) = ⎜ 3 ⎟0.5 (1 − 0.5) = 0.3125
⎝ ⎠
⎛10 ⎞
P(“5 : 5”) = ⎜ ⎟0.55 (1 − 0.5)5 = 0.2461
⎝5⎠
5
23
Examples (1)
Compute the probabilities that in four rolls of a dice, at least one six
would turn up and that in 24 rolls of two dice, a pair of sixes would
turn up. (Chevalier de Méré)
P(“at least one 6 of 4 rolls”) =
1
⎛ 4 ⎞⎛ 1 ⎞
⎜ 1 ⎟⎜ ⎟
⎝ ⎠⎝ 6 ⎠
3
2
2
3
1
4
0
500 + 150 + 20 + 1
⎛ 5 ⎞ ⎛ 4 ⎞⎛ 1 ⎞ ⎛ 5 ⎞ ⎛ 4 ⎞⎛ 1 ⎞ ⎛ 5 ⎞ ⎛ 4 ⎞⎛ 1 ⎞ ⎛ 5 ⎞
= 0.5177
⎜ ⎟ + ⎜ ⎟⎜ ⎟ ⎜ ⎟ + ⎜ ⎟⎜ ⎟ ⎜ ⎟ + ⎜ ⎟⎜ ⎟ ⎜ ⎟ =
4
3
2
6
6
1296
6
6
6
6
6
⎝
⎠
⎝
⎠
⎝
⎠
⎝ ⎠ ⎝ ⎠
⎝ ⎠ ⎝ ⎠
⎝ ⎠ ⎝ ⎠
⎝ ⎠
0
4
P(“at least one 6 of 4 rolls”) = 1 - P(“no 6 of 4 rolls”) = 1 - ⎛ 4 ⎞⎛ 1 ⎞ ⎛ 5 ⎞
⎜ 0 ⎟⎜ ⎟ ⎜ ⎟ = 0.5177
⎝ ⎠⎝ 6 ⎠ ⎝ 6 ⎠
P(“at least one 2×6 of 24 rolls”) =
0
24
1 - P(“no 2×6 of 24 rolls”) = 1 - ⎛ 24 ⎞⎛ 1 ⎞ ⎛ 35 ⎞
⎜ 0 ⎟⎜ ⎟ ⎜ ⎟ = 0.4914
⎝ ⎠⎝ 36 ⎠ ⎝ 36 ⎠
What is the probability that among 50 people in a bus will be more then
2 bald people if we know that baldness in the population is 13%?
A = “more than 2 bald”, AC = “0, 1 or 2 bald”
P(A) = 1 – P(AC) = 1 – ⎛ 50 ⎞0.1300.8750 − ⎛ 50 ⎞0.1310.87 49 − ⎛ 50 ⎞0.13 20.87 48
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎝ 1⎠
⎝2⎠
⎝0⎠
= 1 – 0.0009462 – 0.0070690 – 0.0258790 = 0.9661058
End of Chapter 2
Thank you for your attention!
24
III
Random
Variables
Random Variables
z
A random variable is a numerical description of the
outcome of an experiment.
Random variable X maps ω ∈ Ω to the number X(ω)
z
A random variable can be classified as being either
discrete or continuous depending on the numerical values
it assumes.
A discrete random variable may assume either a finite
number of values or an infinite sequence of values.
A continuous random variable may assume any numerical
value in an interval or collection of intervals.
z
z
1
Examples
z
z
z
The random variable is always denoted as X, never as X(ω)
It is often convenient to not display the arguments of the
functions when it is the functional relationship that is of
interest d(uv) = u•dv + v•du
Discrete random variable with a finite number of values
Let x = number of TV sets sold at the store in one day
where x can take on 5 values (0, 1, 2, 3, 4)
z
Discrete random variable with an infinite sequence of values
Let x = number of customers arriving in one day
where x can take on the values 0, 1, 2, . . .
We can count the customers arriving, but there is no finite
upper limit on the number that might arrive.
Discrete Probability Distributions
z
The probability distribution for a random variable describes
how probabilities are distributed over the values of the random
variable.
z
The probability distribution is defined by a probability function,,
denoted by f(x), which provides the probability for each value
of the random variable.
⎛ x x x K xn ⎞
f (x) : ⎜ 1 2 3
⎟,
⎝ p1 p2 p3 K pn ⎠
Σf(xi) = ΣP(X=xi) = Σpi = 1
z
All the probabilistic information about the discrete random variable X
is summarized in its probability function
z
The probability function can be used to answer questions such as
“What is the probability that X has value between a and b?”
“What is the probability that X is an even number?”
2
Example: JSL Appliances
Using past data on TV sales (below left), a tabular representarepresentation of the probability distribution for TV sales (below right) was
developed.
Units Sold N. of Days
0
1
2
3
4
80
50
40
10
20
200
x
0
1
2
3
4
f(x)
.40
.25
.20
.05
.10
1.00
.50
Probability
G
.40
.30
.20
.10
0
1
2 3 4
Graphical Representation of
the Probability Distribution
What is the probability that in a day the number of sold units will be less than 2?
P(X < 2) = P(X = 0) + P(X = 1) = 0.4 + 0.25 = 0.65
Expected Value
z
The expected value,, or mean,, of a random variable is
a measure of its central location.

Expected value of a discrete random variable:
E(X) = μ = Σxi ⋅f(xi) = Σxi ⋅P(X = xi)
z
Features of expected value
 E(c) = c
 E(cX) = cE(X)
3
Variance (Dispersion)
z
The variance summarizes the variability in the values
of a random variable.

Variance of a discrete random variable:
Var(X) = D(X) = σ 2 = E(X – E(X))2 = Σ(xi - μ)2f(xi)
z
z
Features of variance
 D(c) = 0
 D(cX) = c2D(X)
 D(X) = EX2 – (EX)2
The standard deviation,, σ, is defined as the positive
square root of the variance.
Example: JSL Appliances
z
Expected Value of a Discrete Random Variable
x
0
1
2
3
4
f(x)
xf(
xf(x)
.40
.00
.25
.25
.20
.40
.05
.15
.10
.40
E(x) = 1.20
The expected number of TV sets sold in a day is 1.2
4
Example: JSL Appliances
z
Variance and Standard Deviation of a Discrete Random
Variable
x
x-μ
(x - μ)2 f(x) (x - μ)2f(x)
0
1
2
3
4
-1.2
-0.2
0.8
1.8
2.8
1.44
0.04
0.64
3.24
7.84
.40
.25
.20
.05
.10
.576
.010
.128
.162
.784
1.660 = σ 2
x2f(x
f(x)
0
.25
.80
.45
1.60
2
σ = 3.10 – 1.22
The variance of daily sales is 1.66 TV sets squared.
squared.
The standard deviation of sales is 1.2884 TV sets.
Discrete Uniform Probability
Distribution
z
The discrete uniform probability distribution is the simplest
example of a discrete probability distribution given by a
formula
x x
f(x) = 1/n,
1/n, f ( x ) : ⎛⎜ 1 2
x3 K x n ⎞
⎟
⎝1/n 1/n 1/n K 1/n ⎠
Note that the values of the random variable are equally likely.
E(x) = μ = (1/n)Σxi,
Var(X) = (1/n)Σ(xi - μ)2
5
Binomial Probability Distribution
z
Properties of a Binomial Experiment




The experiment consists of a sequence of n identical trials..
Two outcomes, success and failure,, are possible on each trial.
The probability of a success, denoted by p,, does not change
from trial to trial.
The trials are independent..
f (x) =
n!
p xx (1 − p )((nn−−xx))
x! (n − x )!
E(X) = μ = np,
Var(X) = σ 2 = np(1 - p)
where:
f(x) = the probability of x successes in n trials
n = the number of trials
p = the probability of success on any one trial
Example: Evans Electronics
z
Using the Binomial Probability Function
Evans is concerned about a low retention rate for employees. On the
basis of past experience, management has seen a turnover of 10% of
the hourly employees annually. Thus, for any hourly employees chochosen at random, management estimates a probability of 0.1 that the
person will not be with the company next year.
Choosing 3 hourly employees a random, what is the probability that 1
of them will leave the company this year?
Let:
Let:
f (1) =
p = .10,
n = 3,
x=1
3!
(0.1)1(0.9)2 = (3)(0.1)(0.81) = 0.243
1! (3 − 1)!
E(x) = μ = 3(.1) = .3 employees out of 3
Var(x) = σ 2 = 3(.1)(.9) = .27
6
Example: Evans Electronics
Using the Tables of Binomial Probabilities
z
n
3
x
0
1
2
3
.10
.7290
.2430
.0270
.0010
.15
.6141
.3251
.0574
.0034
.20
.5120
.3840
.0960
.0080
.25
.4219
.4219
.1406
.0156
p
.30
.3430
.4410
.1890
.0270
.35
.2746
.4436
.2389
.0429
.40
.2160
.4320
.2880
.0640
.45
.1664
.4084
.3341
.0911
.50
.1250
.3750
.3750
.1250
Example: Evans Electronics
z
Using a Tree Diagram
First
Worker
Second
Worker
Leaves (.1)
Third
Worker
L (.1)
3
Probab.
.0010
2
.0090
L (.1)
2
.0090
S (.9)
1
.0810
L (.1)
2
.0090
S (.9)
1
.0810
L (.1)
1
.0810
S (.9)
0
.7290
S (.9)
Leaves (.1)
Stays (.9)
Leaves (.1)
Stays (.9)
Stays (.9)
Value
of x
7
Poisson Probability Distribution
z
Properties of a Poisson Experiment


The probability of an occurrence is the same for any two intervals
intervals of
equal length.
The occurrence or nonoccurrence in any interval is independent of
of the
occurrence or nonoccurrence in any other interval.
f (x) =
μ xxe −−μ
x!
where:
f(x) = probability of x occurrences in an interval
μ = mean number of occurrences in an interval, e = 2.71828
Example: Mercy Hospital
z
Using the Poisson Probability Function
Patients arrive at the emergency room of Mercy Hospital
at the average rate of 6 per hour on weekend evenings.
What is the probability of 4 arrivals in 30 minutes on a
weekend evening?
μ = 6/hour = 3/half3/half-hour, x = 4
f (4) =
34 (2.71828)−3
= .1680
4!
8
Example: Mercy Hospital
z
Using the Tables of Poisson Probabilities
μ
x
0
1
2
3
4
5
6
7
8
2.1
.1225
.2572
.2700
.1890
.0992
.0417
.0146
.0044
.0011
2.2
.1108
.2438
.2681
.1966
.1082
.0476
.0174
.0055
.0015
2.3
.1003
.2306
.2652
.2033
.1169
.0538
.0206
.0068
.0019
2.4
.0907
.2177
.2613
.2090
.1254
.0602
.0241
.0083
.0025
2.5
.0821
.2052
.2565
.2138
.1336
..0668
.0278
.0099
.0031
2.6
.0743
.1931
.2510
.2176
.1414
.0735
.0319
.0118
.0038
2.7
.0672
.1815
.2450
.2205
.1488
.0804
.0362
.0139
.0047
2.8
.0608
.1703
.2384
.2225
.1557
.0872
.0407
.0163
.0057
2.9
.0550
.1596
.2314
.2237
.1622
.0940
.0455
.0188
.0068
3.0
.0498
.1494
.2240
.2240
.1680
.1008
.0504
.0216
.0081
Hypergeometric Probability
Distribution
z
z
The hypergeometric distribution is closely related to the binomial
distribution.
With the hypergeometric distribution, the trials are not independent,
independent,
and the probability of success changes from trial to trial.
⎛ r ⎞⎛ N − r ⎞
⎜⎜ ⎟⎟⎜⎜
⎟
x ⎠⎝ n − x ⎟⎠
⎝
, for 0 ≤ x ≤ r
f (x) =
⎛N ⎞
⎜⎜ ⎟⎟
⎝n⎠
where: f(x) = probability of x successes in n trials
n = number of trials
N = number of elements in the population
r = number of elements in the population labeled success
9
Example: Neveready
z
Hypergeometric Probability Distribution
Bob Neveready has removed two dead batteries from a flashlight and
inadvertently mingled them with the two good batteries he intended
intended as
replacements. The four batteries look identical.
Bob now randomly selects two of the four batteries. What is the probability
probability
he selects the two good batteries?
⎛ r ⎞⎛ N − r ⎞ ⎛ 2 ⎞⎛ 2 ⎞
⎜ x ⎟⎜ n − x ⎟ ⎜ 2 ⎟⎜ 0 ⎟
⎠ = ⎝ ⎠⎝ ⎠ = 1 = .167
f ( x ) = ⎝ ⎠⎝
6
⎛N ⎞
⎛ 4⎞
⎜ ⎟
⎜ ⎟
n
2
⎝ ⎠
⎝ ⎠
where: x = 2 = number of good batteries selected
n = 2 = number of batteries selected
N = 4 = number of batteries in total
r = 2 = number of good batteries in total
Example: Lottery
z
Hypergeometric Probability Distribution
In Macedonian lottery 7 numbers are pulled out of 39. What is the
the
probability that a player by filling one column will have k guesses?
39 total
7 win
32 lose
7 choose
k from win
k
P(x=k)
0
⎛ 7 ⎞⎛ 32 ⎞
⎜ ⎟⎜
⎟
k 7−k⎠
, for k = 0,1,…,7
f (k ) = ⎝ ⎠⎝
⎛ 39 ⎞
⎜7⎟
⎝ ⎠
7- k from lose
1
2
3
4
5
6
7
0.23099 0.43533 0.29022 0.08637 0.01191 0.000715 0.0000154 0.000000068
10
Continuous Probability Distributions
z
z
z
A continuous random variable can assume any value in an
interval on the real line or in a collection of intervals.
It is not possible to talk about the probability of the random
variable assuming a particular value.. Instead, we talk about
the probability of the random variable assuming a value
within a given interval..
Area = .2967
The probability of the random
variable assuming a value
within some given interval
from x1 to x2 is defined to be
the area under the graph of
the probability density
function between x1 and x2.
Area = .5 - .2967
= .2033
Area = .5
0 .83
Discrete Vs. Continous
z
Probability function
⎛ x x x K xn ⎞
f (x) : ⎜ 1 2 3
⎟
⎝ p1 p2 p3 K pn ⎠
Σpi = 1
P (a ≤ x ≤ b ) =
E (X) = μ =
Var (X) =
n
∑
∑
pi
i : xi ∈[ a,b ]
z
Probability density function
f(x) : R ⇒ R
+∞
∫ f ( x )dx = 1b
−∞
P (a ≤ x ≤ b ) = ∫ f ( x )dx
a
+∞
n
∑ xi pi
E (X) = μ =
i =1
( x i − μ )2 pi =
i =1
=
n
∑
i =1
−∞
+∞
Var (X) =
∫
( x − μ )2 f ( x )dx =
−∞
xi 2 pi − μ 2
∫ xf ( x )dx
+∞
=
∫x
2
f ( x )dx − μ 2
−∞
11
Example
5
A train will be not in time is given
by the following density function:
E( x ) =
⎧ 3
2
⎪ 500 ( 25 − x ), for − 2 ≤ x ≤ 5
⎪
f (x) = ⎨
⎪
⎪⎩0,
otherwise
=
Find the probability that the train
will be late more than 2 minutes,
the mean and vaiance.
5
P ( x > 2) =
3
∫ 500 (25 − x
=
75 5 x 3 5 108
/=
x/ −
= 0.216
500 2 500 2 500
)dx =
3x 4 5
75 x 2 5
/ =
/ −
500 2 − 2 4 ⋅ 500 − 2
1575 1827
−
= 0.6615
1000 2000
5
Var ( x ) =
)dx =
2
2
−2
∫
−2
2
3x
∫ 500 (25 − x
3x 2
(25 − x 2 )dx − 0.6615 2 =
500
=
25 3 5
3x 5 5
x / −
/ − 0.4376 =
5 ⋅ 500 − 2
500 − 2
=
3325 9471
−
− 0.4376 = 2.424
500 2500
Uniform Probability Distribution
z
A random variable is uniformly distributed whenever the probability
is proportional to the interval’
interval’s length.
Uniform Probability Density Function
⎧ 1
⎪ b − a , for a ≤ x ≤ b
⎪
f (x) = ⎨
⎪
⎪⎩ 0,
otherwise
E(x) = (a + b)/2
Var(x) = (b - a)2/12
where:
where: a = smallest value the variable can assume
b = largest value the variable can assume
12
Example: Slater's Buffet
z
Uniform Probability Distribution
Slater customers are charged for the amount of salad they
take. Sampling suggests that the amount of salad taken is
uniformly distributed between 5 ounces and 15 ounces.
The probability density function is
f(x) = 1/10 for 5 < x < 15
=0
elsewhere where: x = salad plate filling weight
f(x)
P(12 < x < 15) = 1/10(3) = .3
E(x) = (a + b)/2 =
= (5 + 15)/2 = 10
x
Var(x) = (b - a)2/12
= (15 – 5)2/12 = 8.33
1/10
5
10 12
15
Salad Weight (oz.)
Normal Probability Distribution
z
Graph of the Normal Probability Density Function
f(x)
f (x) =
1 −(x −μ)22 / 2σ 22 μ = mean
e
σ = standard
2πσ
deviation
π = 3.14159
e = 2.71828
σ=4
σ=2
μ
x
13
Normal Probability Distribution
z
Characteristics of the Normal Probability Distribution







The shape of the normal curve is often illustrated as a bellshaped curve.
Two parameters,, μ (mean) and σ (standard deviation),
determine the location and shape of the distribution.
The highest point on the normal curve is at the mean, which
is also the median and mode.
The mean can be any numerical value: negative, zero, or
positive.
The normal curve is symmetric..
The standard deviation determines the width of the curve:
larger values result in wider,, flatter curves.
The total area under the curve is 1 (.5 to the left of the mean
and .5 to the right).
Standard Normal Probability
Distribution
z
z
z
A random variable that has a normal distribution with a mean
of zero and a standard deviation of one is said to have a
standard normal probability distribution.
The letter z (or N(0,1))) is commonly used to designate this
normal random variable.
Converting to the Standard Normal Distribution
z=
z
x−μ
σ
μ = 0,σ =1, f (x) =
1 −x22/ 2
e
2π
We can think of z as a measure of the number of standard
deviations x is from μ.
14
Example: Pep Zone
z
Standard Normal Probability Distribution
Pep Zone sells auto parts and supplies including a popular
multimulti-grade motor oil. When the stock of this oil drops to 20
gallons,, a replenishment order is placed.
The store manager is concerned that sales are being lost
due to stockouts while waiting for an order. It has been
determined that leadtime demand is normally distributed with
a mean of 15 gallons and a standard deviation of 6 gallons..
The manager would like to know the probability of a stockout,
stockout,
P(x
x
>
20).
P(
Example: Pep Zone
z
Standard Normal Probability Distribution
The Standard Normal table shows an area of .2967 for the
region between the z = 0 and z = .83 lines below. The
shaded tail area is .5 - .2967 = .2033. The probability of a
stockstock-out is .2033.
Area = .2967
z = (x
(x - μ)/σ
= (20 - 15)/6
= .83
Area = .5 - .2967
= .2033
z
0
.83
15
Example: Pep Zone
z
z
Using the Standard Normal Probability Table
.00
.01
.02
.03
.04
.05
.0 .0000 .0040 .0080 .0120 .0160 .0199
.1 .0398 .0438 .0478 .0517 .0557 .0596
.2 .0793 .0832 .0871 .0910 .0948 .0987
.3 .1179 .1217 .1255 .1293 .1331 .1368
.06
.07
.08
.09
.0239 .0279 .0319 .0359
.0636 .0675 .0714 .0753
.1026 .1064 .1103 .1141
.1406 .1443 .1480 .1517
.4
.1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
.5
.1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
.6
.2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2518 .2549
.7
.2580 .2612 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852
.8
.2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133
.9
.3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
Example: Pep Zone
z
Standard Normal Probability Distribution
If the manager of Pep Zone wants the probability of a
stockout to be no more than .05, what should the reorder
point be?
Area = .05
Area = .5
Area = .45
0
z.05
Let z.05 represent the z value cutting the .05 tail area.
16
Example: Pep Zone
We now looklook-up the .4500 area in the Standard Normal
Probability table to find the corresponding z.05 value.
z
.
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
.
.
.
.
.
.
.
.
.
.
1.5 .4332 .4345 .4357 .4370 .4382 .4394
1.6 .4452 .4463 .4474 .4484 .4495 .4505
1.7 .4554 .4564 .4573 .4582 .4591 .4599
1.8 .4641 .4649 .4656 .4664 .4671 .4678
.4406 .4418 .4429 .4441
.4515 .4525 .4535 .4545
.4608 .4616 .4625 .4633
.4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767
.
.
.
.
.
.
.
.
.
.
.
z.05 = 1.645 is a reasonable estimate. The corresponding
value of x is given by x = μ + z.05σ = 15 + 1.645(6) = 24.87
A reorder point of 24.87 gallons will place the probability of a
stockout during leadtime at .05.
Exponential Probability
Distribution
z
Exponential Probability Density Function
1
f (x) = e−−xx//μ, x ≥ 0, μ ≥ 0
μ
z
Cumulative Exponential Distribution Function
P(x ≤ x00) = 1 − e−−xxoo//μ
z
where:
where: x0 = some specific value of x
Example
The time between arrivals of cars at Al’
Al’s Carwash follows an
exponential probability distribution with a mean time between
arrivals of 3 minutes. Al would like to know the probability
that the time between two successive arrivals will be 2
minutes or less.
P(x < 2) = 1 - 2.71828-2/3 = 1 - .5134 = .4866
17
Example: Al’s Carwash
z
Graph of the Probability Density Function
f(x)
.4
.3
P(x < 2) = area = .4866
.2
.1
x
1
2
3 4
5
6
7
8 9 10
Time Between Successive Arrivals (mins.)
Relationship between the Poisson
and Exponential Distributions
(If) the Poisson distribution
provides an appropriate description
of the number of occurrences
per interval
(If) the exponential distribution
provides an appropriate description
of the length of the interval
between occurrences
18
Chi-Square Distribution
z
If X1, X2, …, Xn are independent random variables with z (N(0,1))
distribution then the random variable
χ2 = X12 + X22 + K + Xn2
has a ChiChi-Square Distribution with n
degrees of freedom
α
1
2
n / 2 −1 − x / 2
χ : f (x) = n / 2
x
e
where Γ(α ) = x α −1e − y dy
2 Γ(n / 2)
0
∫
E(x) = n , Var(x) = 2n
for n=2 we have Exponential PDF
General form: f(x) = λ⋅exp(–λx)⋅(λx)t–1/Γ(t) for x > 0
λ=1
t=3
Student Distribution
If X has N(0,1) and Y have χ2 then t =
X n has a Student Dis.
Y
with n degrees of freedom (X and Y are
are independent random
variables)
t : f (x) =
Γ((n + 1) / 2) ⎛⎜
x 2 ⎞⎟
1+
n ⎟⎠
Γ(n / 2) nπ ⎜⎝
−( n +1) / 2
α
∫
where Γ(α ) = x α −1e − y dy
0
Graph of t density
for n = 1; 3; 8 and
the normal density
with μ =0; σ = 1.
19
Fisher Distribution
If X has χ2 with n1 degrees of freedom and Y have χ2 with n2
degrees of freedom then F =
X/ n1
Y/ n2
has a Fisher Distribution
with (n1,n2) degrees of freedom (X and Y are
are independent random
variables)
α
nn1 / 2nn2 / 2 (n −2) / 2
2
x 1
(n2 + n1x)−(n1+n2 ) / 2 where Γ(α) = xα−1e−y dy
F : f ( x) = 1
β(n1 / 2,n2 / 2)
∫
0
Fisher distribution
Fα
Joint Distributions
z
The generalization from one random variable to two random variables is the most challenging intellectual concept.
Once the two random variable case is understood, the
extension of the ideas to many random variables is easy
z
Let X and Y denote two random variables defined on the
same sample space Ω
z
The outcome ω ∈ Ω is mapped to the real number X(ω) by
the random variable X, and to the real number Y(ω) by the
random variable Y
z
Jointly, the random variables X and Y are said to map the
outcome ω ∈ Ω to the point (X(ω), Y(ω)) in the plane
20
Joint Distributions
z
z
The individual probabilistic descriptions of X and Y
are insufficient to determine the probabilistic behavior
of the random point (X, Y) in the plane
The random point (X, Y) is also called
the bivariate random variable (X, Y)
the joint random variable (X, Y)
the random vector (X, Y)
z
The joint probability density function (joint PDF) for
discrete random variables X and Y taking on values
u1, u2, …, un, … and v1, v2, …, vm, … respectively is
defined as
PX,Y(u,v) = P{X = u, Y = v} = P{{X = u}∩{Y = v}}.
Example
X
0
Y
1
2
3
0
.02 .05 .10 .03 .20
1
.04 .09 .13 .08 .34
2
.05 .15 .17 .09 .46
.11 .29 .40 .20
21
Covariance and Correlation
z
The covariance is a measure of the linear association
between two random variables X and Y
sXY = E((X – EX)(Y – EY)) = … = E(XY) - EX⋅EY
If X and Y are independent, sXY = 0.
E(XY) = EX⋅EY + sXY
Var(X + Y) = Var(X) + Var(Y) + sXY
z
The correlation coeficient is a normalized measure of
the linear association between two random variables
ρXY =
s XY
Var (X)Var (Y)
,
| ρXY | ≤ 1
If X and Y are independent, ρXY = 0.
Examples
y
y
A Positive Relationship
ρXY > 0.8
A Negative Relationship
ρXY < −0.8
x
y
No Apparent Relationship
−0.5 < ρXY < 0.5
x
x
y
No Linear Relationship
−0.5 < ρXY < 0.5
x
22
Example
X
0
Y
1
2
E(X) = 1(.05+.09+.15) + 2(.10+.13+.17) +
3(.03+.08+.09) = 1.69
3
0
.02 .05 .10 .03 .20
E(Y) = 1(.04+.09+.13+.08) +
2(.05+.15+.17+.09) = 1.27
1
.04 .09 .13 .08 .34
Var(X) = .29⋅12+.40⋅22+.20⋅32 – 1.692 = 0.83
2
.05 .15 .17 .09 .46
Var(Y) = .34⋅12+.46⋅22 – 1.272 = 0.58
.11 .29 .40 .20
E(XY) = 1 ⋅ .09 + 2 ⋅ .13 + 3 ⋅ .08 + 2 ⋅ .15 +
4 ⋅ .17 + 6 ⋅ .09 = 2.11
E (XY) − E (X)E (Y) 2.11 − 1.69 ⋅ 1.27
ρXY
=
= −0.06
XY =
0.83 ⋅ 0.58
Var (X)Var (Y)
Conclusion: There is no linear relationship between
Example
X
2
4
6
1
3
E(X) = (2 + 4 + 6 + 1 + 3)/5 = 3.2
Y
1
0
7
4
6
E(Y) = (1 + 0 + 7 + 4 + 6)/5 = 3.6
0.2 for i = j
P ( X = xi ,Y = y j ) = ⎧⎨
for i ≠ j
⎩0
Var(X) = (22 + 42 + 62 + 12 + 32)/5 – 3.22 = 2.96
Var(Y) = (12 + 02 + 72 + 42 + 62)/5 – 3.62 = 7.44
E(XY) = (2 + 0 + 42 + 4 + 18)/5 = 13.2
E (XY) − E (X)E (Y) 13.2 − 3.2 ⋅ 3.6
ρXY
=
= 0.36
XY =
Var (X)Var (Y)
2.96 ⋅ 7.44
Conclusion: There is week linear relationship between X and Y
23
End of Chapter 3
Thank you for your attention!
24
IV
Central Limit
Theorem
Chebyshev Inequality
z
(Chebyshev Inequality) Let X be a random variable with
finite expected value μ = E(X) and finite variance σ2. Then for
any positive number ε > 0 we have
P( X − μ ≥ ε ) ≤
z
σ 22
ε 22
If ε = kσ = k standard deviations for some integer k, then
σ22
1
P ( X − μ ≥ kσ ) ≤ 2 2 = 2
2
2
k σ
k2
% of Values in Some Commonly Used Intervals



(1-1/4) ≡ 75% of values of a random variable are within
+/- 1 standard deviation of its mean.
(1-1/16) ≡ 93.75% of values of a random variable are within
+/- 2 standard deviations of its mean.
(1-1/36) ≡ 97.25% of values of a random variable are within
+/- 3 standard deviations of its mean.
1
Law of Large Numbers
z
(Law of Large Numbers) Let X1, X2, . . . , Xn be an independent trials process with a continuous density function f, finite
expected value μ, and finite variance σ2. Let Sn = X1 +X2 + …
+ Xn be the sum of the Xi. Then for any ε > 0 we have
S
S
lim P ( nn − μ ≥ ε ) = 0 or lim P ( nn − μ ≤ ε ) = 1
nn→
nn→
n
n
→∞
∞
→∞
∞
z
Note that Sn/n is an average of the individual outcomes, and one
often calls the Law of Large Numbers the “law of averages“. It is a
striking fact that we can start with a random experiment about
which little can be predicted and, by taking averages, obtain an
experiment in which the outcome can be predicted with a high
degree of certainty.
Example
Let us consider the special
case of tossing a coin n times
with Sn the number of heads
that turn up. Then the random
variable Sn/n represents the
fraction of times heads turns up
and will have values between 0
and 1.
In Figures, we have plotted the
distribution for this example for
increasing values of n. We have
marked the outcomes between
.45 and .55 by dots at the top of
the spikes. We see that as n
increases the distribution gets
more and more concentrated
around .5 and a larger and larger percentage of the total area
is contained within the interval
(45, 55), as predicted by the
Law of Large Numbers.
2
Central Limit Theorem
z
The second fundamental theorem of probability is the Central Limit
Theorem. This theorem says that if Sn is the sum of n mutually
independent random variables (each of which contributes a small
amount to the total), then the distribution function of Sn is wellapproximated by a normal density function
f (x) =
22
22
1
e −−((xx−−μμ)) // 22σσ
2π σ
μ = 0, σ = 1, f ( x ) =
1 −−xx22 // 22
e
2π
Distribution of
heights of adult
women.
Central Limit Theorem
z
(Central Limit Theorem) Let X1, X2, …, Xn be a sequence of
independent random variables, and let Sn = X1 +X2 + … +Xn. For
each n, denote the mean and variance of Xi by μi and σ2i,
respectively. If there exists a constant A, such that |Xi| ≤ A for all i,
then
22
S /n −μ
1
lim P ( nn
< b) =
e −−yy dy
nn→
→∞
∞
σ/ n
2π − ∞
−∞
bb
∫
z
The above theorem essentially says that anything that can be
thought of as being made up as the sum of many small independent
pieces is approximately normally distributed.
3
Example
Suppose we choose n random numbers from the interval [0, 1] with uniform density. Let X1, X2, ..., Xn denote these choices, and Sn = X1 + X2 +
… + Xn their sum. Then the density function for Sn tends to have a
normal shape, but is centered at n/2.
xx
22
1
S / n − 1/ 2
< x) =
P ( nn
e −−yy dy
σ/ n
2π − ∞
−∞
∫
Density function
for n = 2, 3, 4, 10
General Problem
To calculate probabilities of different events related to some experiment it
is best to know probability density function. PDF enables us to compute
probability of each event.
Parametric: we know the PDF form and only
need to calculate the PDF parameters.
Example: We know that PDF is normal but we do not know
μ and σ.
Problem: supposition about the PDF form
Two approaches
to estimate PDF
Non-Parametric: we do not know anything about
the PDF form.
Example: We build histogram and approximate the PDF.
Problem: we need large number of samples
4
Fortunately
For enough large sample size n, according to central limit
theorem we can consider that it has normal distribution. In
practice it is enough to be n ≥ 30.
Central Limit Theorem enables supposition for normal distribution
of many events because they are influenced by a number of random unknown parameters with undecisive influence to the event.
Everybody believes in correctness of Central Limit Theorem and
using of normal distribution in case of enough large sample size:
Mathematician: because it is experimental fact !
Engineers and practitioners: because it is mathematically proven !
Variations
(n − 1)s
2
• ChiChi-Square Distribution χ =
2
σ
2
, where s 2 =
1 n
( x i − x )2
n i =1
∑
with n-1 degrees of freedom (X1, X2, …, Xn have normal distribution).
• If X has χ2 with n1 degrees of freedom and Y have χ2 with n2
X/ n1
has a Fisher Distribution
Y/ n2
with (n1,n2) degrees of freedom.
x −μ
X n
x −μ
σ/ n
=
=
If X has N(0,1) and Y have χ2 then t =
degrees of freedom then F =
•
Y
(n − 1)s 2
σ
2
/( n − 1)
s/ n
has a Student distribution with n degrees of freedom
These variations of Normal distribution are extremly usefull because
they have only one uknown parameter (to estimate). In the first two it
is σ, while in the third one it is μ
5
End of Chapter 4
6
V
Statistics
Statistics
z
The statistics uses the probability methods to deal
with empirical data obtained by measurements,
observations, etc.
z
The statistics in some sense is opposite to the probability:


z
In probability we examine concrete events and situations
according to the model (Ω
(Ω, ℑ, P)
In statistics we try to build the model using concrete events
and situations (statistical data)
Statistical methods are based on random sample::


n samples are taken from population and analyze
the obtained results enable to infer for all population
1
Statistical Inference
z
The purpose of statistical inference is to obtain
information about a population from information
contained in a sample.
z
A population is the set of all the elements of interest.
z
A sample is a subset of the population.
z
The sample results provide only estimates of the
values of the population characteristics.
z
A parameter is a numerical characteristic of a
population.
z
With proper sampling methods,, the sample results
will provide “good”
good” estimates of the population
characteristics.
Simple Random Sampling
z
Finite Population




z
A simple random sample from a finite population of size N is a
sample selected such that each possible sample of size n has the
same probability of being selected.
Replacing each sampled element before selecting subsequent
elements is called sampling with replacement..
Sampling without replacement is the procedure used most often.
In large sampling projects, computercomputer-generated random numbers
are often used to automate the sample selection process.
Infinite Population



A simple random sample from an infinite population is a sample
selected independently from the same population.
The population is usually considered infinite if it involves an ongoongoing process that makes listing every element impossible.
The random number selection procedure cannot be used for
infinite populations.
2
Some Statistical Tasks
Example: Let us have n independent trials of an experiment E
where an event A occurs m times. How to estimate P = P(A) ?
z
Point estimations of unknown parameters
m

probability P.
P̂ =
≈ P is a good estimation of the uknown probability
n
z
Interval estimation of unknown parameters

z
We should find P and P such that P(" P ∈ [P, P]" ) ≈ 1
For example P = m + 2 and P = m − 2
n
n
Hypothesis Testing

m
m
and hypotesis H1: P ≠ P̂ =
n
n
Wheather we should accept H0?
Hypotesis H0: P = P̂ =
End of Chapter 5
3
VI
Descriptive
Statistics
Descriptive Statistics:
Tabular and Graphical Methods
z
Summarizing Qualitative Data





z
Summarizing Quantitative Data





z
z
Frequency Distribution
Relative Frequency
Percent Frequency Distribution
Bar Graph
Pie Chart
Frequency Distribution
Relative Frequency and Percent Frequency Distributions
Histogram
Cumulative Distributions
Ogive
Exploratory Data Analysis
Crosstabulations and Scatter Diagrams
1
Frequency Distribution
z
A frequency distribution is a tabular summary of data showing
the frequency (or number) of items in each of several
nonoverlapping classes.
z
The relative frequency of a class is the fraction or proportion
of the total number of data items belonging to the class.
A relative frequency distribution is a tabular summary of a set
of data showing the relative frequency for each class.
z
z
z
The percent frequency of a class is the relative frequency
multiplied by 100.
A percent frequency distribution is a tabular summary of a set
of data showing the percent frequency for each class.
Example: Marada Inn
Guests staying at Marada Inn were asked to rate the quality
of their accommodations as being excellent,
excellent, above
average,
average, average,
average, below average,
average, or poor.
poor. The ratings
provided by a sample of 20 guests are shown below.
below.
Below Average
Above Average
Above Average
Average
Above Average
Average
Above Average
Average
Above Average
Below Average
Poor
Excellent
Above Average
Average
Above Average
Above Average
Below Average
Poor
Above Average
Average
2
Example: Marada Inn
z
Frequency Distribution
Rating
Frequency
Poor
2
Below Average
3
Average
5
Above Average
9
Excellent
1
Total
20
Example: Marada Inn
z
Relative Frequency and Percent Frequency Distributions
Rating
Poor
Below Average
Average
Above Average
Excellent
Total
Relative
Percent
Frequency Frequency
.10
.15
.25
.45
.05
1.00
10
15
25
45
5
100
3
Bar Graph: Marada Inn
z
Bar Graph
9
Frequency
8
7
6
5
4
3
2
1
Poor
Below Average Above Excellent
Average
Average
Rating
Pie Chart: Marada Inn
z
Exc.
Poor
5%
10%
Pie Chart
Above
Average
45%
Below
Average
15%
Average
25%
Quality Ratings
4
Example: Hudson Auto Repair
The manager of Hudson Auto would like to get a better pictupicture of the distribution of costs for engine tunetune-up parts. A samsample of 50 customer invoices has been taken and the costs of
parts, rounded to the nearest dollar, are listed below.
91
71
104
85
62
78
69
74
97
82
93
72
62
88
98
57
89
68
68
101
75
66
97
83
79
52
75
105
68
105
99
79
77
71
79
80
75
65
69
69
97
72
80
67
62
62
76
109
74
73
Frequency Distribution
z
Guidelines for Selecting Number of Classes
 Use between 5 and 20 classes.
 Data sets with a larger number of elements usually require a
larger number of classes.
 Smaller data sets usually require fewer classes.
z
Guidelines for Selecting Width of Classes
 Use classes of equal width.
 Approximate Class Width =
Largest Data Value − Smallest Data Value
Number of Classes
5
Example: Hudson Auto Repair
z
Frequency Distribution
If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5 ≅ 10
Cost ($)
5050-59
6060-69
7070-79
8080-89
9090-99
100100-109
Frequency
2
13
16
7
7
5
Total
50
Example: Hudson Auto Repair
z
Relative Frequency and Percent Frequency Distributions
Cost ($)
5050-59
6060-69
7070-79
8080-89
9090-99
100100-109
Total
Relative
Frequency
.04
.26
.32
.14
.14
.10
1.00
Percent
Frequency
4
26
32
14
14
10
100
6
Histogram
z
Another common graphical presentation of quantitative
data is a histogram..
z
The variable of interest is placed on the horizontal axis
and the frequency, relative frequency,, or percent
frequency is placed on the vertical axis.
z
Algorithm:
z

Find the range (width) of the data. (Range=max
– min))
(

Divide the range into classes < 25 samples
5 to 6 classes
25 to 50 samples 7 to 14 classes
> 50 samples 15 to 20 classes

Find the relative frequency for each class.
Unlike a bar graph, a histogram has no natural separation
between rectangles of adjacent classes.
Example: Hudson Auto Repair
Histogram
18
Approximation of
Probability Density Function
16
Frequency
z
14
12
10
8
6
4
2
50
60
70
80
90
100
110
Parts
Cost ($)
7
Cumulative Distribution
z
The cumulative frequency distribution shows the
number of items with values less than or equal to the
upper limit of each class.
z
The cumulative relative frequency distribution shows
the proportion of items with values less than or equal
to the upper limit of each class.
z
The cumulative percent frequency distribution shows
the percentage of items with values less than or
equal to the upper limit of each class.
Example: Hudson Auto Repair
z
Cumulative Distributions
Cost ($)
< 59
< 69
< 79
< 89
< 99
< 109
Cumulative
Frequency
2
15
31
38
45
50
Cumulative Cumulative
Relative
Percent
Frequency Frequency
.04
4
.30
30
.62
62
.76
76
.90
90
1.00
100
8
Ogive
z
z
z
An ogive is a graph of a cumulative distribution.
The data values are shown on the horizontal axis.
Shown on the vertical axis are the:



z
z
cumulative frequencies, or
cumulative relative frequencies, or
cumulative percent frequencies
The frequency (one of the above) of each class is
plotted as a point.
The plotted points are connected by straight lines.
Example: Hudson Auto Repair
z
Ogive



Because the class limits for the partsparts-cost data are 5050-59, 6060-69,
and so on, there appear to be oneone-unit gaps from 59 to 60, 69 to
70, and so on.
These gaps are eliminated by plotting points halfway between
the class limits.
Thus, 59.5 is used for the 5050-59 class, 69.5 is used for the 6060-69
class, and so on.
9
Example: Hudson Auto Repair
Ogive with Cumulative Percent Frequencies
Cumulative Percent Frequency
z
100
80
60
40
20
50
60
70
80
90
100
Parts
Cost ($)
110
Descriptive Statistics:
Numerical Methods
z
z
z
z
z
z
μ
Measures of Location
Measures of Variability
Measures of Relative Location and Detecting Outliers
Exploratory Data Analysis
Measures of Association Between Two Variables
The Weighted Mean and
Working with Grouped Data
x
σ
%
10
Measures of Location
z
z
z
z
z
Mean
Median
Mode
Percentiles
Quartiles
Example: Apartment Rents
Given below is a sample of monthly rent values ($)
for oneone-bedroom apartments. The data is a sample
of 70 apartments in a particular city. The data are
presented in ascending order.
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
11
Mean
z
The mean of a data is the average of all the data values.
z
If the data are from a sample, the mean is denoted by
x=
z
x.
∑ xi
n
If the data are from a population, the mean is denoted by μ .
∑ xi
N
μ=∑
Example: Apartment Rents
z
Mean
425
440
450
465
480
510
575
x=
430
440
450
470
485
515
575
430
440
450
470
490
525
580
∑xi = 34356 = 490.80
n
435
445
450
472
490
525
590
70
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
12
Median
z
z
z
z
The median of a data set is the value in the middle when
the data items are arranged in ascending order.
For an odd number of observations, the median is the
middle value.
For an even number of observations, the median is the
average of the two middle values.
A few extremely large incomes or property values can
inflate the mean but not median.
Example: Apartment Rents
z
Median
Median = 50th percentile
i = (p
(p/100)n
/100)n = (50/100)70 = 35.5
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
13
Mode
z
z
z
z
The mode of a data set is the value that occurs with
greatest frequency.
The greatest frequency can occur at two or more
different values.
If the data have exactly two modes, the data are
bimodal..
If the data have more than two modes, the data are
multimodal..
Example: Apartment Rents
z
Mode
425
440
450
465
480
510
575
450 occurred most frequently (7 times)
430 430 435Mode
435= 450
435 435 435
440 440 445 445 445 445 445
450 450 450 450 460 460 460
470 470 472 475 475 475 480
485 490 490 490 500 500 500
515 525 525 525 535 549 550
575 580 590 600 600 600 600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
14
Percentiles
z
z
A percentile provides information about how the data are
spread over the interval from the smallest value to the
largest value.
The pth percentile of a data set is a value such that at least p
percent of the items take on this value or less and at least
(100 - p) percent of the items take on this value or more.


Arrange the data in ascending order.
Compute index i, the position of the pth percentile.
i = (p
(p/100)n
/100)n


If i is not an integer, round up. The p th percentile is the value in
the i th position.
If i is an integer, the p th percentile is the average of the values
in positions i and i+1..
Example: Apartment Rents
z
90th Percentile
i = (p
(p/100)n
/100)n = (90/100)70 = 63
Averaging the 63rd and 64th data values:
90th Percentile = (580 + 590)/2 = 585
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
15
Quartiles
z
z
z
z
Quartiles are specific percentiles
First Quartile = 25th Percentile
Second Quartile = 50th Percentile = Median
Third Quartile = 75th Percentile
Example: Apartment Rents
z
Third Quartile
Third quartile = 75th percentile
i = (p
(p/100)n
/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435
440 440 440 445 445 445 445 445
450 450 450 450 450 460 460 460
465 470 470 472 475 475 475 480
480 485 490 490 490 500 500 500
510 515 525 525 525 535 549 550
575 575 580 590 600 600 600 600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
16
Measures of Variability
z
z
z
z
z
z
z
It is often desirable to consider measures of variability
(dispersion), as well as measures of location.
For example, in choosing supplier A or supplier B we might
consider not only the average delivery time for each, but
also the variability in delivery time for each.
Range
Interquartile Range
Variance
Standard Deviation
Coefficient of Variation
Range
z
z
z
The range of a data set is the difference between the
largest and smallest data values.
It is the simplest measure of variability.
It is very sensitive to the smallest and largest data
values.
17
Example: Apartment Rents
z
Range
Range = largest value - smallest value
Range = 615 - 425 = 190
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
Interquartile Range
z
z
z
The interquartile range of a data set is the difference
between the third quartile and the first quartile.
It is the range for the middle 50% of the data.
It overcomes the sensitivity to extreme data values.
18
Example: Apartment Rents
z
Interquartile Range
3rd Quartile (Q
(Q3) = 525
1st Quartile (Q
(Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
Variance
z
z
The variance is a measure of variability that utilizes all
the data.
It is based on the difference between the value of each
observation (x
(xi) and the mean (x
(x for a sample, μ for a
population).
19
Variance
z
The variance is a measure of variability that utilizes all
the data.
z
The variance is the average of the squared differences
between each data value and the mean.
If the data set is a sample, the variance is denoted by s2.
z
∑ ( xi − x ) 2
n−1
2
s2 =
z
If the data set is a population, the variance is denoted by σ 2.
2
2 ∑ ( xi − μ)
σ =
σ =
N
Standard Deviation
z
z
z
The standard deviation of a data set is the positive
square root of the variance.
It is measured in the same units as the data,, making it
more easily comparable, than the variance, to the mean.
If the data set is a sample, the standard deviation is
denoted s.
s = s2
z
If the data set is a population, the standard deviation is
denoted σ (sigma).
σ =
σ2
20
Coefficient of Variation
z
z
z
The coefficient of variation indicates how large the
standard deviation is in relation to the mean.
If the data set is a sample, the coefficient of variation is
computed as follows:
s
(100)
x
If the data set is a population, the coefficient of variation
is computed as follows:
σ
(100)
μ
Example: Apartment Rents
s2 = ∑
( x i − x )2
z
Variance
z
Standard Deviation
z
Coefficient of Variation
n −1
= 2996 . 16
s = s 2 = 2996.47 = 54.74
s
54 .74
× 100 =
× 100 = 11 .15
x
490 .80
21
Measures of Relative Location
and Detecting Outliers
z
z
z
z
z-Scores
Chebyshev’
Chebyshev’s Theorem
Empirical Rule
Detecting Outliers
z-Scores
z
z
The z-score is often called the standardized value.
It denotes the number of standard deviations a data
value xi is from the mean.
x −x
zi = i
s
z
z
z
A data value less than the sample mean will have a
z-score less than zero.
A data value greater than the sample mean will have
a zz-score greater than zero.
A data value equal to the sample mean will have a zzscore of zero.
22
Example: Apartment Rents
z
z-Score of Smallest Value (425)
z=
-1.20
-0.93
-0.75
-0.47
-0.20
0.35
1.54
xi − x 425 − 490.80
=
= −1.20
s
54.74
Standardized Values for Apartment Rents
-1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93
-0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75
-0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47
-0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20
-0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17
0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45
1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27
-0.93
-0.75
-0.47
-0.20
0.35
1.45
2.27
Chebyshev’s Theorem
At least (1 - 1/k2) of the items in any data set will be
within k standard deviations of the mean, where k is
any value greater than 1.



At least 75% of the items must be within
k = 2 standard deviations of the mean.
At least 89% of the items must be within
k = 3 standard deviations of the mean.
At least 94% of the items must be within
k = 4 standard deviations of the mean.
23
Example: Apartment Rents
z
Chebyshev’s Theorem
Let k = 1.5 with x = 490.80 and s = 54.74
At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56%
of the rent values must be between
x - k(s) = 490.80 - 1.5(54.74) = 409
and
x + k(s) = 490.80 + 1.5(54.74) = 573
Example: Apartment Rents
z
Chebyshev’s Theorem (continued)
Actually, 86% of the rent values
are between 409 and 573.
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
24
Empirical Rule
For data having a bellbell-shaped distribution:

Approximately 68% of the data values will be within
one standard deviation of the mean.
Empirical Rule
For data having a bellbell-shaped distribution:

Approximately 95% of the data values will be within two
standard deviations of the mean.
25
Empirical Rule
For data having a bellbell-shaped distribution:

Almost all (99.7%) of the items will be within three
standard deviations of the mean.
Example: Apartment Rents
z
Empirical Rule
Within +/+/- 1s
Within +/+/- 2s
Within +/+/- 3s
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
Interval
436
436.06 to 545.54
381.32 to 600.28
326.58 to 655.02
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
% in Interval
48/70 = 69%
68/70 = 97%
70/70 = 100%
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
26
Example: Apartment Rents
z
Five-Number Summary
Lowest Value = 425
First Quartile = 450
Median = 475
Third Quartile = 525
Largest Value = 615
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
Measures of Association
Between Two Variables
z
Covariance



z
The covariance is a measure of the linear association
between two variables.
Positive values indicate a positive relationship..
Negative values indicate a negative relationship..
Correlation Coefficient



The coefficient can take on values between -1 and +1.
Values near -1 indicate a strong negative linear relationship..
Values near +1 indicate a strong positive linear relationship..
27
Covariance
z
If the data sets are samples, the covariance is denoted
by sxy.
sxy =
z
∑ ( xi − x )( yi − y )
n −1
If the data sets are populations, the covariance is
denoted by σ xy
σ xy =
∑ ( xi − μ x )( yi − μ y )
N
Correlation Coefficient
z
If the data sets are samples, the coefficient is rxy.
rxy =
z
sxy
sx s y
If the data sets are populations, the coefficient is ρ xy
xy .
ρ xy =
σ xy
σ xσ y
28
Grouped Data
z
z
z
z
The weighted mean computation can be used to obtain
approximations of the mean, variance, and standard
deviation for the grouped data.
To compute the weighted mean, we treat the midpoint of
each class as though it were the mean of all items in the
class.
We compute a weighted mean of the class midpoints
using the class frequencies as weights.
Similarly, in computing the variance and standard
deviation, the class frequencies are used as weights.
Mean for Grouped Data
z
Sample Data
x=
∑fM
∑f
i
i
i
z
Population Data
μ=∑
fi M i
N
where:
fi = frequency of class i
Mi = midpoint of class i
29
Example: Apartment Rents
Given below is the previous sample of monthly rents
for oneone-bedroom apartments presented here as grouped
data in the form of a frequency distribution.
Rent ($) Frequency
420-439
8
440-459
17
460-479
12
480-499
8
500-519
7
520-539
4
540-559
2
560-579
4
580-599
2
600-619
6
Example: Apartment Rents
z
Mean for Grouped Data
Rent ($)
420-439
440-459
460-479
480-499
500-519
520-539
540-559
560-579
580-599
600-619
Total
fi
8
17
12
8
7
4
2
4
2
6
70
Mi
429.5
449.5
469.5
489.5
509.5
529.5
549.5
569.5
589.5
609.5
f iMi
3436.0
7641.5
5634.0
3916.0
3566.5
2118.0
1099.0
2278.0
1179.0
3657.0
34525.0
x=
34525
= 493.21
70
This approximation
differs by $2.41 from
the actual sample
mean of $490.80.
30
Variance for Grouped Data
z
Sample Data
s2 =
∑ f i ( Mi − x ) 2
n −1
For apartment rents variance is s 2 = 3017.89
s = 3017.89 = 54.94
while standard deviation is
This approximation differs by only $.20 from the actual standard
deviation of $54.74.
z
Population Data
∑ fi ( Mi − μ ) 2
σ2 =
2
N
End of Chapter 6
31
VII
Point
Estimation
Sampling and Sampling
Distributions
z
z
z
z
z
z
z
z
z
Simple Random Sampling
Point Estimation
Introduction to Sampling Distributions
Sampling Distribution of x
Sampling Distribution of s
Sampling Distribution of p
Properties of Point Estimators
Other Sampling Methods
Other Estimation Methods
n = 100
n = 30
1
Point Estimation
z
In point estimation we use the data from the sample to
compute a value of a sample statistic that serves as an
estimate of a population parameter.
Let us X be random variable with distribution f(x,θ) which depends
on an unknown paremeter θ.. (if it depends on more parameters we
will consider them separately)
For estimation of θ we take a random sample x1, x2, …,xn, and
compute a function t = t(x1, x2, …,xn) that estimate θ.. For each
random sample we obtain a number for θ.
z
z
z
We refer to x as the point estimator of the population mean μ.
s is the point estimator of the population standard deviation σ.
p is the point estimator of the population proportion p..
Sampling Error
z
The absolute difference between an unbiased point
estimate and the corresponding population parameter is
called the sampling error..
z
Sampling error is the result of using a subset of the
population (the sample), and not the entire population to
develop estimates.
z
The sampling errors are:
| x − μ | for sample mean
| s - σ | for sample standard deviation
| p − p | for sample proportion
2
Example: St. Andrew’s
St. Andrew’s University receives 900 applications annually
from prospective students. The application forms contain a
variety of information including the individual’s scholastic
aptitude test (SAT) score and whether or not the individual
desires on-campus housing.
The director of admissions would like to know the following
information:


the average SAT score for the applicants, and
the proportion of applicants that want to live on campus.
We will now look at three alternatives for obtaining the
desired information.



Conducting a census of the entire 900 applicants
Selecting a sample of 30 applicants, using a random number
table
Selecting a sample of 30 applicants, using computercomputergenerated random numbers
Example: St. Andrew’s
z
Taking a Census of the 900 Applicants

SAT Scores
z
z

Population Mean
μ=∑
xi
= 990
900
Population Standard Deviation
σ=
∑(x − μ)
2
i
900
= 80
Applicants Wanting OnOn-Campus Housing
z
Population Proportion
p=
648
= .72
900
3
Example: St. Andrew’s
z
Take a Sample of 30 Applicants Using ComputerGenerated Random Numbers
900 random numbers are generated, one for each applicant
in the population.
Then we choose the 30 applicants corresponding to the 30
smallest random numbers as our sample.
Each of the 900 applicants have the same probability of
being included.



Using Excel to Select
a Simple Random Sample
z
Formula Worksheet
1
2
3
4
5
6
7
8
9
A
B
C
Applicant
Number
1
2
3
4
5
6
7
8
SAT
Score
1008
1025
952
1090
1127
1015
965
1161
On-Campus
Housing
Yes
No
Yes
Yes
Yes
No
Yes
No
D
Random
Number
=RAND()
=RAND()
=RAND()
=RAND()
=RAND()
=RAND()
=RAND()
=RAND()
Note: Rows 1010-901 are not shown.
4
Using Excel to Select
a Simple Random Sample
z
Value Worksheet
1
2
3
4
5
6
7
8
9
A
B
C
D
Applicant
Number
1
2
3
4
5
6
7
8
SAT
Score
1008
1025
952
1090
1127
1015
965
1161
On-Campus
Housing
Yes
No
Yes
Yes
Yes
No
Yes
No
Random
Number
0.41327
0.79514
0.66237
0.00234
0.71205
0.18037
0.71607
0.90512
Note: Rows 1010-901 are not shown.
Using Excel to Select
a Simple Random Sample
z
Value Worksheet (Sorted)
1
2
3
4
5
6
7
8
9
A
B
C
D
Applicant
Number
12
773
408
58
116
185
510
394
SAT
Score
1107
1043
991
1008
1127
982
1163
1008
On-Campus
Housing
No
Yes
Yes
No
Yes
Yes
Yes
No
Random
Number
0.00027
0.00192
0.00303
0.00481
0.00538
0.00583
0.00649
0.00667
Note: Rows 10-901 are not shown.
5
Example: St. Andrew’s
z
Point Estimates
∑ xi = 29,910 = 997
x=
30
30

x as Point Estimator of μ

s as Point Estimator of σ s =

p as Point Estimator of p p = 20 30 = .68
∑(x − x )
2
z
i
29
=
163,996
= 75.2
29
Note: Different random numbers would have identified a
different sample which would have resulted in
different point estimates.
Sampling Distributionx of
z
Process of Statistical Inference
Population
with mean
μ=?
The value of x is used to
make inferences about
the value of μ.
A simple random sample
of n elements is selected
from the population.
The sample data
provide a value for
the sample mean x .
6
Properties of Point Estimators
z
Before using a sample statistic as a point estimator, stastatisticians check to see whether the sample statistic has
the following properties associated with good point
estimators.



Unbiasedness
Efficiency
Consistency
Properties of Point Estimators
z
Unbiasedness
If the expected value of the sample statistic is equal to the
population parameter being estimated, the sample statistic
is said to be an unbiased estimator of the population
parameter.
E( x ) =
E ( x1) + E ( x2 ) + K + E ( xn ) nμ
=
=μ
n
n
E (s ) = K =
E( p ) =
n −1
σ ⇒ biased, s =
n
∑i ( xi − x )2 is unbiased
n −1
E ( x1) + E ( x2 ) + K + E ( xn ) np
1 if A occurs
=
= p, xi = ⎧⎨
n
n
⎩0 otherwise
7
Properties of Point Estimators
z
Efficiency
Given the choice of two unbiased estimators of the same
popo- pulation parameter, we would prefer to use the point
estimator with the smaller standard deviation,, since it
tends to provide estimates closer to the population
parameter.
Example:
1
1
3
1
μ ≈ x = ∑15xi , μ ≈ x = ∑13x i + x 4 + x5 ⇒ E ( x ) = E ( x ) = μ
5
5
10
10
but:
s( x ) =
1
1 2
9 2
1 2
22
σ, while s( x ) =
(σ + σ2 + σ2 ) +
σ +
σ =
σ
5
25
100
100
100
Properties of Point Estimators
z
Consistency
A point estimator is consistent if the values of the point estiestimator tend to become closer to the population parameter as
the sample size becomes larger.
lim P (| t ( x1, x2 ,K, xn ) − θ | ≥ ε) = 0, ∀ε > 0
n →∞
Example:
Var ( x )
σ2
lim P (| x − μ | ≥ ε) = lim P (| x − E ( x ) | ≥ ε) ≤ lim
=
lim
=0
n →∞
n →∞
n →∞
n → ∞ nε 2
ε2
x is a consistent estimator of μ..
8
Standard Deviations
z
Standard Deviation of x
Finite Population
σx = (
z
σ
N −n
)
n N −1
Infinite Population
σx =
σ
n
Standard Deviation of p
Finite Population
σp =
Infinite Population
p(1 − p ) N − n
n
N −1
σp =
p(1 − p )
n
A finite population is treated as being infinite if n/N < .05..
(N − n) /(N − 1)
is the finite correction factor.
Sampling Distribution of x
z
If we use a large (n > 30) simple random sample, the
central limit theorem enables us to conclude that the
sampling distribution of x can be approximated by a
normal probability distribution.
z
When the simple random sample is small (n < 30), the
sampling distribution of x can be considered normal only
if we assume the population has a normal probability
distribution.
9
Example: St. Andrew’s
z
Sampling Distribution of x for the SAT Scores
σx =
E( x ) = μ = 990
σ
n
=
80
= 14.6
30
x
Example: St. Andrew’s
z
Sampling Distribution of x for the SAT Scores
What is the probability that a simple random sample of 30
applicants will provide an estimate of the population mean
SAT score that is within plus or minus 10 of the actual
population mean μ ?
In other words, what is the probability that x will be
between 980 and 1000?
10
Example: St. Andrew’s
z
Sampling Distribution of x for the SAT Scores
Sampling
distribution
of x
Area = .2518
Area = .2518
x
980 990 1000
Using the standard normal probability table with
z = 10/14.6= .68, we have area = (.2518)(2) = .5036
Example: St. Andrew’s
z
Sampling Distribution of p for In-State Residents
σp =
.72(1− .72)
= .082
30
E( p ) = .72
The normal probability distribution is an acceptable
approximation since np = 30(.72) = 21.6 > 5 and
n(1 - p) = 30(.28) = 8.4 > 5.
11
Example: St. Andrew’s
z
Sampling Distribution of p for In-State Residents
What is the probability that a simple random sample of 30
applicants will provide an estimate of the population proporproportion of applicants desiring onon-campus housing that is within
plus or minus .05 of the actual population proportion?
In other words, what is the probability that p will be
between .67 and .77?
Example: St. Andrew’s
z
Sampling Distribution of p for In-State Residents
Sampling
distribution
of p
Area = .2291
Area = .2291
p
0.67 0.72 0.77
For z = .05/.082 = .61, the area = (.2291)(2) = .4582.
The probability is .4582 that the sample proportion will be
within +/+/-.05 of the actual population proportion.
12
Other Sampling Methods
z
z
z
z
z
Stratified Random Sampling
Cluster Sampling
Systematic Sampling
Convenience Sampling
Judgment Sampling
Stratified Random Sampling
z
z
z
z
z
z
The population is first divided into groups of elements
called strata..
Each element in the population belongs to one and only
one stratum.
Best results are obtained when the elements within each
stratum are as much alike as possible (i.e.
homogeneous group).
).
A simple random sample is taken from each stratum..
Advantage: If strata are homogeneous, this method is
as “precise”
precise” as simple random sampling but with a
smaller total sample size.
Example: The basis for forming the strata might be
department, location, age, industry type, etc.
13
Cluster Sampling
z
z
z
z
z
z
z
The population is first divided into separate groups of
elements called clusters..
Ideally, each cluster is a representative smallsmall-scale version
of the population (i.e. heterogeneous group).
A simple random sample of the clusters is then taken.
All elements within each sampled (chosen) cluster form the
sample.
Advantage: The close proximity of elements can be cost
effective (I.e. many sample observations can be obtained in
a short time).
Disadvantage: This method generally requires a larger total
sample size than simple or stratified random sampling.
Example: A primary application is area sampling, where
clusters are city blocks or other wellwell-defined areas.
Systematic Sampling
z
z
z
z
z
If a sample size of n is desired from a population
containing N elements, we might sample one element
for every n/N elements in the population.
We randomly select one of the first n/N elements
from the population list.
We then select every n/Nth element that follows in
the population list.
Advantage: The sample usually will be easier to
identify than it would be if simple random sampling
were used.
Example: Selecting every 100th listing in a telephone
book after the first randomly selected listing
14
Convenience Sampling
z
z
z
z
z
It is a nonprobability sampling technique.. Items are
included in the sample without known probabilities of
being selected.
The sample is identified primarily by convenience..
Advantage: Sample selection and data collection are
relatively easy.
Disadvantage: It is impossible to determine how
representative of the population the sample is.
Example: A professor conducting research might use
student volunteers to constitute a sample.
Judgment Sampling
z
z
z
z
z
The person most knowledgeable on the subject of the
study selects elements of the population that he or she
feels are most representative of the population.
It is a nonprobability sampling technique..
Advantage: It is a relatively easy way of selecting a
sample.
Disadvantage: The quality of the sample results
depends on the judgment of the person selecting the
sample.
Example: A reporter might sample three or four
senators, judging them as reflecting the general opinion
of the senate.
15
Other Estimation Methods
z
Maximal Probabilty
We are looking for maximum of the function:
n
max L( x1, x 2 ,K, x n , θ) = max ∏ f ( xi , θ),
θ
z
θ
i =1
)
∂L
= 0 ⇒ θ = θ( x1, x2 ,K, xn )
∂θ
Least Squares
min( t ( x1, x 2 ,K, xn ) − θ )2
θ
z
z
z
Systematic Sampling
Convenience Sampling
Judgment Sampling
End of Chapter 7
16
VIII
Interval
Estimation
Interval Estimation
z
z
z
z
z
z
Interval Estimation – Basic Method
Interval Estimation of a Population Mean:
LargeLarge-Sample Case
Interval Estimation of a Population Mean:
SmallSmall-Sample Case
Determining the Sample Size
Interval Estimation of a Population
Proportion
Interval Estimation of a Population
Variance
μ
x
[--------------------- x ---------------------]
[--------------------- x ---------------------]
[--------------------- x ---------------------]
1
Interval Estimation
Point estimators are not enough determined because of lack of
information about error and confidence.
So, instead of one, we can use two
estimators t1 = t1(x1, x2, …,xn) and
t2 = t2(x1, x2, …,xn) of the unknown
parameter θ, such that:
that:
P(t1 ≤ θ ≤ t2) = 1 - α
(confidence probability)
The probability that the
parameter θ is in the
confidence interval (t1, t2)
Interval Estimation – basic method
Let us θˆ be an estimator of θ such that E(θˆ ) = θ . We suppose
that sample is normaly distributed or its size is n ≥ 30.
Then, z =
θˆ − θ
σ θˆ
has z(0,1) distribution.
P(- zα/2 ≤ z ≤ zα/2) = 1−α
θˆ
α/2
P(θˆ − zα / 2 ⋅σθˆ ≤ θ ≤ θˆ − zα / 2 ⋅σθˆ ) = 1− α
Examples:
z0.025 = 1.96 ⇒ P(θ ∈θˆ ±1.96⋅σθˆ ) = 0.95
z0.05 = 1.65 ⇒ P(θ ∈θˆ ±1.65⋅σ ˆ ) = 0.90
θ
z0.33 = 0.44 ⇒ P(θ ∈θˆ ± 0.44⋅σθˆ ) = 0.34
Sampling
distribution
of θ
1 - α of all
θ values
α/2
μ
θ
There is a 1 - α probability
that the value of a sample
mean will provide a sampling
error of zα/2σθ or less.
2
Interval Estimate of a
Population Mean: (n > 30)
z
With σ known
where:
P ( μ ∈ x ± zα / 2
σ
n
) = 1− α
x is the sample mean
1 − α is the confidence probability
zα/2 is the z value providing an area of
α/2 in the upper tail of the standard
normal probability distribution
σ is the population standard deviation
n is the sample size
If n ≥ 30, x has a approximatelly normal distribution z( μ,
σ2
).
n
Interval Estimate of a
Population Mean: (n > 30)
z
With σ unknown
In most applications the value of the population
standard deviation is unknown. We simply use the
value of the sample standard deviation, s,, as the
point estimate of the population standard deviation.
For n ≥ 30 it is quite acceptable.
P ( μ ∈ x ± zα / 2
s
) = 1− α
n
3
Example: National Discount, Inc.
z
Large-Sample Case (n ≥ 30) with σ Unknown
National Discount has 260 retail outlets throughout the
United States. National evaluates each potential location
for a new retail outlet in part on the mean annual income of
the individuals in the marketing area of the new location.
Sampling can be used to develop an interval estimate of the
mean annual income for individuals in a potential marketing
area for National Discount.
A sample of size n = 36 was taken. The sample mean, x ,
is $21,100 and the sample standard deviation, s, is $4500.
We will use .95 as the confidence coefficient in our interval
estimate.
Example: National Discount, Inc.
There is a .95 probability that the value of a sample
mean for National Discount will provide a sampling error
of $1,470 or less……
less…….. determined as follows:
95% of the sample means that can be observed are within +
1.96 σ x of the population mean μ.
If σ x =
s
4500
=
= 750 , then 1.96 = 1470.
σx
n
36
Interval Estimate of μ is:
or
$21,100 + $1,470
$19,630 to $22,570
We are 95% confident that the interval contains the
population mean.
4
Interval Estimation of a
Population Mean: (n < 30)
z
Population is Not Normally Distributed
The only option is to increase the sample size to
n > 30 and use the largelarge-sample intervalinterval-estimation
procedures.
z
Population is Normally Distributed and σ is Known
The largelarge-sample intervalinterval-estimation procedure can
be used.
z
Population is Normally Distributed and σ is Unknown
The appropriate interval estimate is based on a
probability distribution known as the t distribution..
Interval Estimation of a
Population Mean: (n < 30)
z
Interval Estimate
where
P( μ ∈ x ± tαα // 22
s
) = 1− α
n
1 -α is the confidence coefficient
tα/2 is the t value providing an area of α/2
in the upper tail of a t distribution with
n - 1 degrees of freedom
s is the sample standard deviation
Sample has normal distribution, but it is not correct to subsubstitute σ with s (because n < 30). In that case, the random
x −μ
variable
has a t distribution (normal/χ
(normal/χ2 ).
s / n −1
5
Example: Apartment Rents
z
Small-Sample Case (n < 30) with σ Unknown
A reporter for a student newspaper is writing an article
on the cost of offoff-campus housing. A sample of 10
oneone-bedroom units within a halfhalf-mile of campus
resulted in a sample mean of $550 per month and a
sample standard deviation of $60.
Let us provide a 95% confidence interval estimate of
the mean rent per month for the population of oneonebedroom units within a halfhalf-mile of campus. We’
We’ll
assume this population to be normally distributed.
Example: Apartment Rents
z
t Value
At 95% confidence, 1 - α = .95, α = .05, and α/2 = .025.
t.025 is based on n - 1 = 10 - 1 = 9 degrees of freedom.
In the t distribution table we see that t.025 = 2.262.
Degrees
of Freedom
.
7
8
9
10
.
,10
.
1,415
1,397
1,383
1,372
.
Area in Upper Tail
,05
,025
,01
.
.
.
1,895
2,365
2,998
1,860
2,306
2,896
1,833
2,262
2,821
1,812
2,228
2,764
.
.
.
,005
.
3,499
3,355
3,250
3,169
.
6
Example: Apartment Rents
x ± t.025
550 ± 2.262
s
n
60
10
550 + 42.92
or $507.08 to $592.92
We are 95% confident that the mean rent per month
for the population of oneone-bedroom units within a halfhalfmile of campus is between $507.08 and $592.92.
$592.92.
Example: Evacuate Victims
According to one study “The majority of people who die from fire and
smoke in compartmented firefire-resistive buildingsbuildings-the type used for
hotelshotels-die in the attempt to evacuate. (Risk management. Feb.
1986). The following data represent the number of victims who
attempted to evacuate for a sample of 14 recent fires:
Fire
Died
Las Vegas Hilton (Las Vegas)
5
Howard J. (New Orleans)
Fire
Died
5
Inn on the park (Toronto)
5
Cornell Univ. (Ithaca)
9
Westchase Hilton (Houston)
8
Wesport Central (Kanzas C.)
4
Holiday Inn (Cambridge, OH)
10
Orrington (Evanston, Illinois)
0
Conrad Hilton (Chicago)
4
Hartford Hospital (Hartford)
16
Providence (Providence)
8
Milford Plaza (New York)
0
Baptist Towers (Atlanta)
7
MGM Grand (Las Vegas)
36
Construct a 98% confidence interval for the true mean number of victims per fire. What is the confidence of the interval with ±3 victims
around the mean?
7
Example: Evacuate Victims
x = (5 + 5 + 8 + 10 + 4 + 8 + 7 + 5 + 9 + 4 + 0 + 16 + 0 + 36) / 14 = 117 / 14 = 8.36
s 2 = (3.362 + 3.362 + 0.362 + 1.642 + 4.362 + 0.362 + 1.362 + 3.362 + 0.642 +
4.362 + 8.362 + 7.642 + 8.362 + 27.642 ) / 13 = 79.94 ⇒ s = 8.94
1 − α = 0.98 ⇒ α / 2 = 0.01 ⇒ t 013.01 = 2.650
s
8.94
x ± t 13
= 8.36 ± 2.650
= 8.36 ± 6.33
.01 n
14
If we want 99% confidence:
s
8.94
x ± t 13
= 8.36 ± 3.012
= 8.36 ± 7.2
.005 n
14
If we want to find the confidence of the interval 8.36 ± 3:
13
3 = tα13
α // 22
8.94
14
13
⇒ tα13
= 1.2555 ⇒ α / 2 > 0.1 ⇒ 1 − α < 0.80
α // 22 = 3
8.94
14
Sample Size for an Interval Estimate
of a Population Mean
z
z
z
z
z
Let E = the maximum sampling error mentioned in the
precision statement.
E is the amount added to and subtracted from the point
estimate to obtain an interval estimate.
E is often referred to as the margin of error.
We have
σ
E = zα / 2
n
Solving for n we have
n=
( zα / 2 )2 σ 2
E2
8
Example: National Discount, Inc.
Suppose that National’
National’s management team wants an estimaestimate of the population mean such that there is a .95 probability
that the sampling error is $500 or less.
How large a sample size is needed to meet the required
precision?
σ
zα / 2
= 500
n
At 95% confidence, z.025 = 1.96.
Recall that σ = 4,500.
(1.96)2 ⋅ ( 4500 )2
= 311.17
Solving for n we have n =
500 2
We need to sample 312 to reach a desired precision of
+ $500 at 95% confidence.
Interval Estimation
of a Population Proportion
z
Interval Estimate P( p ∈ p ± zα / 2
where:
p (1 − p )
) = 1− α
n
1 - α is the confidence coefficient
zα/2 is the z value providing an area of
α/2 in the upper tail of the standard
normal probability distribution
p is the sample proportion
9
Example: Political Science, Inc.
z
Interval Estimation of a Population Proportion
Political Science, Inc. (PSI) specializes in voter polls and
surveys designed to keep political office seekers informed
of their position in a race. Using telephone surveys, interinterviewers ask registered voters who they would vote for if
the election were held that day.
In a recent election campaign, PSI found that 220 regisregistered voters, out of 500 contacted, favored a particular
candidate. PSI wants to develop a 95% confidence ininterval estimate for the proportion of the population of
registered voters that favors the candidate.
Example: Political Science, Inc.
p ± zα / 2
where:
p (1 − p )
n
n = 500, p = 220/500 = .44,
.44 ± 1.96
zα/2 = 1.96
.44(1 − .44 )
500
.44 + .0435
PSI is 95% confident that the proportion of all voters
that favors the candidate is between .3965 and .4835.
10
Sample Size for an Interval Estimate
of a Population Proportion
z
Let E = the maximum sampling error mentioned in the
precision statement.
We have
p(1 − p )
E = zα / 2
n
z
Solving for n we have
z
n=
( zα / 2 )2 p(1 − p )
E2
Example: Political Science, Inc.
Suppose that PSI would like a .99 probability that the sample
proportion is within + .03 of the population proportion.
How large a sample size is needed to meet the required
precision?
At 99% confidence, z.005 = 2.576.
n=
( zα / 2 )2 p(1 − p )
E2
=
(2.576 )2 (.44 )(.56 )
≅ 1817
(.03)2
Note: We used .44 as the best estimate of p in the
above expression. If no information is available
about p, then .5 is often assumed because it provides
the highest possible sample size. If we had used
p = .5, the recommended n would have been 1843.
11
Interval Estimation of σ 2
z
Interval Estimate of a Population Variance
P(
(n − 1)s 2
χα2 / 2
≤σ2 ≤
(n − 1)s 2
χ(21−α / 2)
) = 1− α
where the χ2 values are based on a chichi-square distribution
with n − 1 degrees of freedom and where 1 − α is the
confidence coefficient.
If xi have normal distribution N(μ,σ2),
Taking the square
1
2
root of the upper
( x i − x )2 , has χ2
then s =
∑
n −1
and lower limits of
distribution (sum of squares of random
the variance interinterval provides the
variables with normal distribution). So,
confidence interval
(n − 1)s 2
2
χ
=
(to be sum of N(0,1) r.var.)
for the population
σ2
standard deviation.
and from P( χ12−α / 2 ≤ χ 2 ≤ χα2 / 2 ) = 1 − α
Example: Buyer’s Digest
Buyer’
Buyer’s Digest rates thermostats manufactured for home tempetempera ture control. In a recent test, 10 thermostats man ufactured by
ThermoRite were selected and placed in a test room that was
maintained at a temperature of 68oF. The temperature readings
of the ten thermostats are listed below.
We will use the 10 readings to develop a 95% confidence interval
estimate of the population variance.
Therm.
2
3
4
5
6
7
8
9
10
Therm. 1
Temp. 67.4 67.8 68.2 69.3 69.5 67.0 68.1 68.6 67.9 67.2
12
Example: Buyer’s Digest
n - 1 = 10 - 1 = 9 degrees of freedom and α = .05
χ.2975 ≤
2.70 ≤
.025
(n − 1)s 2
σ
2
(n − 1)s 2
σ2
≤ χ.2025
≤ 19.02
.025
This area is 0.95
0 χ
χ
2
.975
2
.025
χ2
19.02
2.70
Example: Buyer’s Digest
Sample variance s2 provides a point estimate of σ 2.
s2 =
∑ ( x i − x )2
n −1
=
6. 3
= .70
9
A 95% confidence interval for the population variance is given
by:
(10 − 1).70
(10 − 1).70
≤σ2 ≤
19.02
2.70
0.33 < σ 2 < 2.33
13
Example: Can Production
A quality control suprvisor in cannery knows that the exact
amount each can contains will vary, since there are certain
uncontrollable factors that affect the amount of fill. The mean
fill per can is important, but equally important is the variation
variation
2
2
σ of the amount of fill. If σ is large, some cans will contain
too little and others too much. In order to estimate the
variation of fill at the cannery, the supervisor randomly
selects 10 cans and weighs the contents in each. The
following results are obtained:
x = 7.98 ounces, s = 0.04 ounces
Construct a 90% confidence interval for the true variation in
fill of cans at the cannery.
Example: Can Production
⎧ χ 2 = 16.9190 for 9 degrees
1 − α = 0.90 ⇒ α / 2 = 0.05 ⇒ ⎨ 02.05
⎩ χ 0.95 = 3.3251 of freedom
9 ⋅ 0.04
9 ⋅ 0.04
≤σ2 ≤
⇒ 0.000851 ≤ σ 2 ≤ 0.004331
16.9190
3.3251
The quality control supervisor could use this interval to
check wheather the variation of fill at the cannery is too
large and in violation of government regulations.
14
Interval Estimation of σ12/σ22
z
Interval Estimate of the ratio between Population Variances
P(
1
σ 12 s12
≤
≤
Fα / 2 (ν 2 ,ν 1)) = 1 − α
s22 Fα / 2 (ν 1,ν 2 ) σ 22 s22
s12
where the F values are based on a Fisher distribution with
(ν1, ν2) degrees of freedom and where 1 − α is the
confidence coefficient.
A random variable has Fisher distribution:
Taking the square
root of the upper
and lower limits of
the variance interinterval provides the
confidence interval
for the ratio between
population standard
deviation.
(n1 − 1)s12
F=
χ12 /ν 1
χ 22 /ν 2
=
σ 12
(n2 − 1)s22
σ 22
/( n1 − 1)
=
/( n2 − 1)
s12 / σ 12
s22 / σ 22
And knowing that F1−α (ν 1,ν 2 ) =
from P(F1−α / 2
1
Fα (ν 2 ,ν 1)
≤ F ≤ Fα / 2 ) = 1 − α
End of Chapter 8
15
IX
Hypothesis
Testing
Hypothesis Testing
z
About concept of Hypothesis
z
Parametric Tests
z
Developing Null and Alternative Hypotheses
z
Type I and Type II Errors
z
Tests About a Population Mean:
Large-Sample Case
z
Tests About a Population Mean:
Small-Sample Case
z
Tests About a Population Proportion
z
Tests about Population Variance
z
Hypothesis Testing and Decision Making
z
Calculating the Probability of Type II Errors
z
Determining the Sample Size for a Hypothesis Test
1
Hypothesis Testing
z
z
z
z
z
About the nature of some event many hypothesis H0, H1,
…, Hk could be established (developed)
Because of various reasons important for us can be H0,
called Null Hypothesis.. The other hypothesis could be
considered as an Alternative Hypothesis HA.
To decide which hypothesis to accept we conduct an exexperiment (take sample) obtaining a value T(x1, x2, …, xn)
We devide the sample space Ω (in our case Rn) on two
sets A and B= Ω-A and
if T ∈ A we accept H0
if T ∈ B we accept HA
In ideal case:
P(T ∈ B/H0) = 0, Never reject H0 when it is true
P(T ∈ B/HA) = 0, Always reject H0 when it is false
(Never reject HA when it is true)
Hypothesis Testing
z
Unfortunatly, such ideal partition of the sample space into
sets A and B is not possible. So, we take a number α > 0,,
and choose the critical domain B such that
P(T ∈ B/H0) ≤ α (α is called significance level and it gives the
probability to reject H0 when it is true.
Ussualy we take α=0.05, 0.01, 0.001)
α gives Type I error (rejecting H0 when it is true)
P(T ∈ A/HA) = β ⇒ 1−β = P(T ∈ B/HA) (1−β is called
power of the test and it gives the probability
of correctly rejecting H0 when it is false)
β gives Type II error (accepting H0 when it is false)
2
Parametric Tests
z
Let us X be random variable with distribution f(x,θ) which
depends on an unknown paremeter θ. We test parameter θ
using a test sample in the following way:
H0: θ = θ0, HA: θ = θ1 < θ0 or
H0: θ = θ0, HA: θ = θ1 > θ0 or
H0: θ = θ0, HA: θ = θ1 ≠ θ0
z
The sample is a random vector T(x1, x2, …, xn) with PDF:
n
n
n
i =1
i =1
L(T) = ∏ f ( xi ,θ ) and L 0 (T) = ∏ f ( xi ,θ 0 ) , L1(T) = ∏ f ( xi ,θ1)
i =1
z
Rn
Lemma: If exist area B in
and a number c such that
L 0 (T)
L 0 (T)
< c when T ∈ B and
≥ c when T ∉ B
L1(T)
L1(T)
then B is the best critical domain. (Neuman-Pirson)
Parametric Tests
z
z
Let us X be random variable with N(μ,σ2) distribution, where
μ is unknown and σ can be replaced by s.. We want to test:
H0: θ = θ0, HA: θ = θ1 < θ0
We take the sample (x1, x2, …, xn) and calculate
1
− 2 ∑1n( xi − μ0 )2
2
2
1
1
2σ
e −( xi − μ0 ) / 2σ = n
e
and L1(T)
σ (2π )n / 2
i =1 σ 2π
n
n
L 0 (T) = ∏
1
− 2 ∑1( xi − μ0 ) 2
L 0 (T) e 2σ
2σ 2 ln c + n( μ02 − μ12 )
1 n
=
<
c
⇒
x
<
when T∈B
∑
i
1
1
n
L1(T)
n
2n( μ0 − μ1)
− 2 ∑1( xi − μ1 )2
new
c
2
σ
e
n
2
x − μ0 c − μ0
<
and
σ/ n σ/ n
x − μ0 c − μ0
P( x < c / H0 ) = α ⇒ P(
<
/ H0 ) = α
σ/ n σ/ n
The expression x < c is transforme d in
zα gives the best critical domain
x − μ0
Thus, if
< zα , we reject H0 with level of significance α.
σ/ n
3
Null or Alternative ?
z
z
z
Hypothesis testing is similar to a criminal trial::
H0: The defendant is innocent
HA: The defendant is guilty
Ussualy the theory we want to support should be alternative
hypothesis.
Testing Research Hypotheses


z
Testing the Validity of a Claim


z
The research hypothesis should be expressed as the alternative..
The conclusion that the research hypothesis is true comes from
sample data that contradict the null hypothesis.
Manufacturers’
Manufacturers’ claims are usually given the benefit of the doubt
and stated as the null hypothesis..
The conclusion that the claim is false comes from sample data
that contradict the null hypothesis.
Testing in Decision-Making Situations

A decision maker might have to choose between two courses of
action, one associated with the null hypothesis and another
associated with the alternative hypothesis.
A Summary of Forms for Null
and Alternative Hypotheses
about a Population Mean
z
z
The equality part of the hypotheses always appears in
the null hypothesis.
In general, a hypothesis test about the value of a populapopulation mean μ must take one of the following three forms
(where μ0 is the hypothesized value of the population
mean).
H0: μ > μ0
HA: μ < μ0
H0: μ < μ0
HA: μ > μ0
H0: μ = μ0
HA: μ ≠ μ0
4
Example: Metro EMS
A major west coast city provides one of the most comprehencomprehensive emergency medical services in the world. Operating in a
multiple hospital system with approximately 20 mobile medical
units, the service goal is to respond to medical emergencies
with a mean time of 12 minutes or less.
The director of medical services wants to formulate a hypohypothesis test that could use a sample of emergency response
times to determine whether or not the service goal of 12
minutes or less is being achieved.
Conclusion and Action
Hypotheses
H0: μ < 12
The emergency service is meeting the response
goal; no follow-up action is necessary.
HA: μ > 12
The emergency service is not meeting the
response goal; follow-up action is necessary.
μ = mean response time of medical emergency requests.
Type I and Type II Errors
z
z
z
z
z
z
Since hypothesis tests are based on sample data, we
must allow for the possibility of errors.
A Type I error is rejecting H0 when it is true.
A Type II error is accepting H0 when it is false.
The person conducting the hypothesis test specifies
the maximum allowable probability of making a
Type I error, denoted by α and called the level of
significance.
Generally, we cannot control for the probability of
making a Type II error, denoted by β.
Statistician avoids the risk of making a Type II error
by using “do not reject H0” and not “accept H0”.
5
Example: Metro EMS
Conclusion
Population Condition
H0 True
Ha True
(μ < 12 )
(μ > 12 )
Accept H0
(Conclude μ < 12)
Correct
Conclusion
Type II
Error
Reject H0
(Conclude μ > 12)
Type I
Εrror
Εrror
Correct
Conclusion
The Steps of Hypothesis Testing
z
Determine the appropriate hypotheses..
z
Select the test statistic for deciding whether or not to
reject the null hypothesis.
z
Specify the level of significance α for the test.
z
Use α to develop the rule for rejecting H0.
z
Collect the sample data and compute the value of the
test statistic..
z
a) Compare the test statistic to the critical value(s) in
the rejection rule, or
b) Compute the p-value based on the test statistic
and compare it to α to determine whether or not to
reject H0.
6
One-Tailed Tests about a
Population Mean: (n > 30)
z
Hypotheses
H0: μ < μ0
HA: μ > μ0
z
Test Statistic
σ Known
z = x −μ
σ/ n
σ Unknown
z = x −μ
00
00
z
H0: μ > μ0
HA: μ < μ0
or
s/ n
Rejection Rule
Reject H0 if z > zα
Reject H0 if z < -zα
Example: Metro EMS
z
One-Tailed Test about a Population Mean: Large n
Let α = P(Type I Error) = .05
Sampling distribution
of x (assuming H0 is
true and μ = 12)
Reject H0
Do Not Reject H0
α = .05
1.645σ x
12
x
c
(Critical value)
7
Example: Metro EMS
z
One-Tailed Test about a Population Mean: Large n
Let n = 40, x = 13.25 minutes, s = 3.2 minutes
(The sample standard deviation s can be used to
estimate the population standard deviation σ.)
z=
x − μ 13.25 − 12
=
= 2.47
3.2 / 40
σ/ n
Since 2.47 > 1.645, we reject H0.
Conclusion:
Conclusion: We are 95% confident that Metro EMS
is not meeting the response goal of 12 minutes;
appropriate action should be taken to improve
service.
Example: Metro EMS
z
Using the p-value to Test the Hypothesis
Recall that z = 2.47 for x = 13.25.
Then p
-value = P(z > 2.47) = .0068.
p-value
.0068.
Since p
-value < α, that is .0068 < .05, we reject H0.
p-value
Reject H0
Do Not Reject H0
0
p-value= .0068
1.645 2.47
z
8
Two-Tailed Tests about a
Population Mean: (n > 30)
z
Hypotheses
z
Test Statistic
z
H0: μ = μ0
HA: μ ≠ μ0
σ Known
z = x − μ0
σ/ n
σ Unknown
z = x − μ0
s/ n
Rejection Rule
Reject H0 if |z| > zα/2
Example: Glow Toothpaste
The production line for Glow toothpaste is designed to fill
tubes of toothpaste with a mean weight of 6 ounces.
Periodically,
Periodically, a sample of 30 tubes will be selected in order to
check the filling process. Quality assurance procedures call
for the continuation of the filling process if the sample results
results
are consistent with the assumption that the mean filling
weight for the population of toothpaste tubes is 6 ounces;
otherwise the filling process will be stopped and adjusted.
Hypotheses
H0: μ = 6, HA: μ ≠ 6
Rejection Rule
Αssuming a .05 level of significance,
Reject H0 if z < -1.96 or if z > 1.96
9
Example: Glow Toothpaste
z
Two-Tailed Test about a Population Mean: Large n
Sampling distribution
of x (assuming H0 is
true and μ = 6)
Reject H0
Reject H0
α /2= .025
Do Not Reject H0
α /2= .025
z
-1.96
0
1.96
Example: Glow Toothpaste
z
Two-Tailed Test about a Population Mean: Large n
Assume that a sample of 30 toothpaste tubes provides a
sample mean of 6.1 ounces and standard
deviation of 0.2 ounces.
Let n = 30, x = 6.1 ounces,
z=
s = .2 ounces
x − μ0 6.1− 6
=
= 2.74
s / n .2/ 30
Since 2.74 > 1.96, we reject H0.
Conclusion:
Conclusion: We are 95% confident that the mean filling
weight of the toothpaste tubes is not 6 ounces. The filling
process should be stopped and the filling mechanism
adjusted.
10
Example: Glow Toothpaste
z
Using the p-Value for a Two-Tailed Hypothesis Test
Suppose we define the p-value for a two-tailed test as double
the area found in the tail of the distribution.
With z = 2.74, the standard normal probability table shows
there is a .5000 - .4969 = .0031 probability of a difference
larger than .1 in the upper tail of the distribution.
distribution.
Considering the same probability of a larger difference in
the lower tail of the distribution, we have
p-value = P(|z|>2.74) = 2(.0031) = .0062
The p-value .0062 is less than α = .05, so H0 is rejected.
Confidence Interval Approach to
a Two-Tailed Test about a
Population Mean
z
z
Select a simple random sample from the population and use
the value of the sample mean x to develop the confidence
interval for the population mean μ.
If the confidence interval contains the hypothesized value μ0,
do not reject H0. Otherwise, reject H0.
The 95% confidence interval for μ is
x ± zα / 2
σ
n
= 6.1 ± 1.96(.2
30 ) = 6.1 ± .0716
or 6.0284 to 6.1716
Since the hypothesized value for the population mean, μ0 = 6,
is not in this interval, the hypothesishypothesis-testing conclusion is that
the null hypothesis, H0: μ = 6, can be rejected.
11
Tests about a Population Mean:
Small-Sample Case (n < 30)
z
Test Statistic σ Known
t = x −μ
σ/ n
00
σ Unknown t = x − μ
s/ n
00
This test statistic has a t distribution with n - 1 degrees of
freedom.
z
Rejection Rule
OneOne-Tailed
H0:
H0:
H0:
μ < μ0
μ > μ0
μ = μ0
TwoTwo-Tailed
Reject H0 if t > tα
Reject H0 if t < -tα
Reject H0 if |t| > tα/2
Example: Highway Patrol
z
One-Tailed Test about a Population Mean: Small n
A State Highway Patrol periodically samples vehicle
speeds at various locations on a particular roadway.
The sample of vehicle speeds is used to test the
hypothesis
H0: μ < 65.
The locations where H0 is rejected are deemed the best
locations for radar traps.
At Location F, a sample of 16 vehicles shows a mean
speed of 68.2 mph with a standard deviation of 3.8 mph.
Use an α = .05 to test the hypothesis.
12
Example: Highway Patrol
z
One-Tailed Test about a Population Mean: Small n
Let n = 16, x = 68.2 mph, s = 3.8 mph
α = .05, d.f. = 1616-1 = 15, tα = 1.753
t=
x − μ0 68.2 − 65
=
= 3.37
s / n 3.8 / 16
Since 3.37 > 1.753,, we reject H0.
Conclusion:
Conclusion: We are 95% confident that the mean speed of
vehicles at Location F is greater than 65 mph. Location F is
a good candidate for a radar trap.
Summary of Test Statistics for
Population Mean
Yes
σ known ?
Yes
No
Yes
Use s to
estimate σ
x −μ
σ/ n
z=
x −μ
s/ n
No
σ known ?
Yes
z=
No
n > 30 ?
z=
x −μ
σ/ n
Use s to
estimate σ
t=
x −μ
s/ n
Popul.
Popul.
approx.
normal
?
No
Increase n
to > 30
13
Sample size n ≥ 30:
Sample size n < 30:
Tests about a Population Proportion:
Large-Sample Case (np > 5 and n(1 - p) > 5)
z
Test Statistic
where:
z
z=
σp =
p − p0
σp
p0 (1 − p0 )
n
Rejection Rule
OneOne-Tailed
H0: p < p0
H0: p > p0
H0: p = p0
TwoTwo-Tailed
Reject H0 if z > zα
Reject H0 if z < -zα
Reject H0 if |z| > zα/2
14
Example: NSC
z
Two-Tailed Test about a Population Proportion: Large n
For a Christmas and New Year’
Year’s week, the National Safety Council
estimated that 500 people would be killed and 25,000 injured on the
nation’
nation’s roads. The NSC claimed that 50% of the accidents would
be caused by drunk driving. A sample of 120 accidents showed that
67 were caused by drunk driving. Use these data to test the NSC’
NSC’s
claim with α = 0.05.
Hypothesis H0: p = .5 ; HA: p ≠ .5
 Test Statistic
p − p0 (67 /120) − .5
p (1 − p0 )
.5(1 − .5)
=
= 1.278
=
= .045644 z =
σp = 0
.045644
σp
120
n



Rejection Rule Reject H0 if z < -1.96 or z > 1.96
Conclusion Do not reject H0. For z = 1.278, the p-value is
.201. If we reject H0, we exceed the maximum allowed risk of
committing a Type I error (p
(p-value > .050).
Hypothesis Testing
About a Population Variance
z
Left-Tailed Test

Hypotheses:

Rejection Rule:
H0 : σ 2 ≤ σ 02
H A : σ 2 > σ 02
2
Test Statistic: χ =
(n − 1)s 2
σ 02
2
2
Reject H0 if χ > χα (chi(chi-square distribution with n - 1 d.f.) or
Reject H0 if p-value < α
z
Right-Tailed Test


Hypotheses:
H0 : σ 2 ≥ σ 02
H A : σ 2 < σ 02
2
Test Statistic: χ =
(n − 1)s 2
σ 02
Rejection Rule
2
2
Reject H0 if χ > χ(1−α ) (chi(chi-square distribution with n - 1 d.f.)
or Reject H0 if p-value < α
15
Hypothesis Testing
About a Population Variance
z
Two-Tailed Test
H0 : σ 2 = σ 02
♦ Hypotheses:
♦ Rejection Rule:
H A : σ 2 ≠ σ 02
2
Test Statistic: χ 2 = ( n − 1)s
2
σ0
2
Reject H0 if χ < χ (1−α / 2) or χ > χα / 2 (where χ (1−α / 2) and χα / 2
are based on a chichi-square distribution with n - 1 d.f.) or Reject
H0 if p-value < α
2
2
2
2
2
Hypothesis Testing
About the Variances
z
One-Tailed Test

Hypotheses:

Rejection Rule:
H0 : σ 12 ≤ σ 22
H A : σ 12 > σ 22
Test Statistic:
F=
s12
s22
Reject H0 if F > Fα where the value of Fα is based on an F
distribution with n1 - 1 (numerator) and n2 - 1 (denominator) d.f.
z
Two-Tailed Test

Hypotheses:

Rejection Rule:
H0 : σ 12 = σ 22
H A : σ 12 ≠ σ 22
Test Statistic:
F=
s12
s22
Reject H0 if F > Fα/2 where the value of Fα/2 is based on an F
distribution with n1 - 1 (numerator) and n2 - 1 (denominator) d.f.
16
Example: Buyer’s Digest
Buyer’
Buyer’s Digest has conducted the same test, as was described
earlier, on another 10 thermostats, this time manufactured by
TempKing.
TempKing. The temperature readings of the ten thermostats
are listed below.
We will conduct a hypothesis test with α = .10 to see if the
variances are equal for ThermoRite’
ThermoRite’s thermostats and
TempKing’
TempKing’s thermostats.
Therm.
2
3
4
5
6
7
8
9 10
Therm. 1
Temp. 66.4 67.8 68.2 70.3 69.5 68.0 68.1 68.6 67.9 66.2
Example: Buyer’s Digest
z
Hypothesis Testing About the Variances of Two Populations

Hypotheses H0 : σ 12 = σ 22 (ThermoRite and TempKing thermothermostats have same temperature variance)
H A : σ 12 ≠ σ 22 (Their variances are not equal)

Rejection Rule
The F distribution table shows that with α = .10,
.10, 9 d.f. (numer.),
and 9 d.f. (denom.), F.05 = 3.18. Reject H0 if F > 3.18

Test Statistic
ThermoRite’
ThermoRite’s sample variance is .70.
TempKing’
TempKing’s sample variance is 1.52. F = 1.52/.70 = 2.17

Conclusion
We cannot reject H0. There is insufficient evidence to conclude
that the population variances differ for the two thermostat brands.
brands.
17
Hypothesis and Decision Making
In many decisiondecision-making situations the decision maker may want, and
in some cases may be forced, to take action with both the conclusion
do not reject H0 and reject H0. In such situations, it is recommended
that the hypothesishypothesis-testing procedure be extended to include
consideration of making a Type II error.
x − μ1
For test with critical domain
> zα :
σ/ n
x − μ0
P(T ∈ B/ H A ) = P( x > c / H A ) = P(
> zα / H A ) =
σ/ n
x − μ1
Does not have N(0,1). When HA is true N(0,1) has
, so:
σ/ n
x − μ0 μ0 − μ1
μ − μ1
x − μ1
= P(
+
> zα + 0
/ H A ) = P(
> zα + λ / H A ) = 1 − β
σ/ n σ/ n
σ/ n
σ/ n
x − μ1
x − μ1
< zα + λ / H A ) = 1 − β
< zα: P(
σ/ n
σ/ n
x − μ1
P( −zα / 2 + λ <
< zα / 2 + λ / H A ) = 1 − β
σ/ n
For test with critical domain
For two-tailed test :
Probability of Type II Error β
Calculating the Probability of a Type II Error: Metro EMS
n = 40, x = 13.25 minutes, s = 3.2 minutes
1. Hypotheses are: H0: μ = 12 min and HA: μ ≠ 12 min
2. Rejection rule is: Reject H0 if |z|
|z| > z.05/2 = 1.645
3. Value of the sample mean that identifies the rejection region:
region:
x − 12
= 2.4705 > 1.96 = z.025 ⇒
3.2 / 40
4. We will accept H0 when x < 12.99
z=
x > 12 +
1.96 ⋅ 3.2
= 12.99
40
18
Example: Metro EMS (revisited)
Calculating the Probability of a Type II Error
12− μ1 12.99− μ1
Values of μ1 1.96+ λ = 1.96+ 3.2/ 40 = 3.2/ 40 β
z
14.0
13.6
13.2
12.99
12.90
12.4
12.0001
-2.00
-1.21
-0.42
0.00
0.18
1.17
1.96
.0104
.0643
.3372
.5000
.5714
.8023
.9750
1-β
.9896
.9357
.6628
.5000
.4286
.1977
.0250
Recall: The probability of correctly rejecting H0 when it is false
is called the power of the test.
For any particular value of μ1, the power is 1 – β.
Example: Metro EMS (revisited)
z
Observations about the preceding table:


z
When the true population mean μ is close to the null hypothesis
value of 12, there is a high probability that we will make a Type
Type II
error.
When the true population mean μ is far above the null
hypothesis value of 12, there is a low probability that we will
make a Type II error.
Relationship among α, β, and n



Once two of the three values are known, the other can be
computed.
For a given level of significance α, increasing the sample size n
will reduce β.
For a given sample size n, decreasing α will increase β, whereas
increasing α will decrease β.
19
Example: Highway Improvements
The department of Highway improvements, responsible for repairing
repairing
of 2525-mile stretch of interstate highway, wants to design surface that
will be structurally efficient. One important consideration is the
the volume
of heavy freight traffic. State weigh stations reports that the average
number of of havyhavy-duty trailers on a 2525-mile segment is 72 per hour.
However, engineers believe that the volume of heavy freight traffic
traffic is
greater than the average reported. In order to validate this theory,
theory, the
department monitors the highway for 50 1-hour periods randomly
selected throughout the month. The sample mean and standard
deviation of the heavy freight traffic for the 50 sampled hours are:
x =74.1, s = 13.3. Do the data support the department’
department’s theory? Use α = .10.
H0: μ = 72; HA: μ > 72;
z=
x − 72 74.1 − 72
=
= 1.12 < 1.28 = z.1 ⇒ H0 (state report) accepted
s / n 13.3 / 50
Example: Highway Improvements
If the number of heavy freight trucks is in fact 78 per hour, what is the
probability that the test procedure would fail to detect it?
72 − 78
x − 78
1 − β = P(
.28 +
> 1{
/ H A ) = P( z > −1.91) = 0.9719
13.3 / 50
13
.32
/4
50
zα
14
3
λ
Therefore, the probability of accepting H0 when μ = 78 is only 0.0281.
If the number of heavy freight
trucks is in fact 78 per hour, for
β we have:
72 − 74
zα + λ = 1.28 +
= 0.22
13.3 / 50
β
μ=72 74.1
μ=78
P( z > 0.22) = 0.4129 = 1 − β
β = 0.588
x High type II error
20
Determining the Sample Size
for a Hypothesis Test About a
Population Mean
n=
( zα + zβ )2 σ 2
( μ0 − μa )2
where
zα = z value providing an area of α in the tail
zβ = z value providing an area of β in the tail
σ = population standard deviation
μ0 = value of the population mean in H0
μa = value of the population mean used for the
Type II error
Note: In a twotwo-tailed hypothesis test, use zα /2 not zα
End of Chapter 9
21