Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Probability Theory
Rosen 5th ed., Chapter 5
1
Random Experiment - Terminology
• A random (or stochastic) experiment is an
experiment whose result is not certain in
advance.
• An outcome is the result of a random
experiment.
• The sample space S of the experiment is the
set of possible outcomes.
• An event E is a subset of the sample space.
2
Examples
• Experiment: roll two dice
• Possible Outcomes: (1,2), (5,6) etc.
• Sample space: {(1,1) (1,2) … (6,6)}
(36 possible outcomes)
• Event: sum is 7
{(1,6) (6,1) (2,5) (5,2) (3,4) (4,3)}
3
Probability
• The probability p = Pr[E] [0,1] of an event E is
a real number representing our degree of certainty
that E will occur.
– If Pr[E] = 1, then E is absolutely certain to occur,
– If Pr[E] = 0, then E is absolutely certain not to occur,.
– If Pr[E] = ½, then we are completely uncertain about
whether E will occur;
– What about other cases?
4
Four Definitions of Probability
• Several alternative definitions of probability
are commonly encountered:
– Frequentist, Bayesian, Laplacian, Axiomatic
• They have different strengths &
weaknesses.
• Fortunately, they coincide and work well
with each other in most cases.
5
Probability: Laplacian Definition
• First, assume that all outcomes in the sample
space are equally likely.
• Then, the probability of event E is given by:
Pr[E] = |E| / |S|
6
Example
• Experiment: roll two dice
• Sample space S: {(1,1) (1,2) … (6,6)}
(36 possible outcomes)
• Event E: sum is 7
{(1,6) (6,1) (2,5) (5,2) (3,4) (4,3)}
P(E) = |E| / |S| = 6 / 36 = 1/6
7
Another Example
• What is the probability of wining the lottery
assuming that you have to pick correctly 6
numbers out of 40 numbers?
8
Probability of
Complementary Events
• Let E be an event in a sample space S.
• Then, E represents the complementary
event:
Pr[E] = 1-Pr[E]
• Proof:
Pr[E]=(|S|-|E|)/|S| = 1 – |E|/|S| = 1 − Pr[E]
9
Example
• A sequence of ten bits is randomly
generated. What is the probability that at
least one of these bits is 0?
10
Probability of Unions of Events
• Let E1,E2 S, then:
Pr[E1 E2] = Pr[E1] + Pr[E2] − Pr[E1E2]
– By the inclusion-exclusion principle.
11
Example
• What is the probability that a positive integer
selected at random from the set of positive
integers not exceeding 100 is divisible by either 2
or 5?
12
Mutually Exclusive Events
• Two events E1, E2 are called mutually
exclusive if they are disjoint: E1E2 =
• Note that two mutually exclusive events
cannot both occur in the same instance of a
given experiment.
– If event E1 happens, then event E2 cannot, or
vice-versa.
• For mutually exclusive events,
Pr[E1 E2] = Pr[E1] + Pr[E2].
13
Example
• Experiment: Rolling a die once:
–
–
–
–
Sample space S = {1,2,3,4,5,6}
E1 = 'observe an odd number' = {1,3,5}
E2 = 'observe an even number' = {2,4,6}
E1E2 = , so E1 and E2 are mutually
exclusive.
14
Probability: Axiomatic Definition
• How to study probabilities of experiments where
outcomes may not be equally likely?
• Let S be the sample space of an experiment with a
finite or countable number of outcomes.
• Define a function p:S→[0,1], called a probability
distribution.
– Assign a probability p(s) to each outcome s where
p(s)= (# times s occurs) / (# times the experiment is
performed)
as # of times ∞
15
Probability: Axiomatic Definition
• Two conditions must be met:
• 0 ≤ p(s) ≤ 1 for all outcomes sS.
• ∑ p(s) = 1 (sum is over all sS).
• The probability of any event ES is just:
Pr[ E ] p( s)
sE
16
Example
• Suppose a die is biased (or loaded) so that 3 appears twice
as often as each other number but that the other five
outcomes are equally likely. What is the probability that an
odd number appears when we roll the die?
17
Properties (same as before …)
Pr[E] = 1 − Pr[E]
Pr[E1 E2] = Pr[E1] + Pr[E2] − Pr[E1E2]
• For mutually exclusive events,
Pr[E1 E2] = Pr[E1] + Pr[E2].
18
Independent Events
• Two events E,F are independent if the
outcome of event E, has no effect on the
outcome of event F.
Pr[EF] = Pr[E]·Pr[F].
• Example: Flip a coin, and roll a die.
Pr[ quarter is heads die is 1 ] =
Pr[quarter is heads] × Pr[die is 1]
19
Example
• Are the events
E=“a family with three children has children of both sexes”,
and
F=“a family with three children has at most one boy”
independent?
20
Mutually Exclusive
vs Independent Events
• If A and B are mutually exclusive, they
cannot be independent.
• If A and B are independent, they cannot be
mutually exclusive.
21
Conditional Probability
• Then, the conditional probability of E given F,
written Pr[E|F], is defined as
Pr[E|F] = Pr[EF]/Pr[F]
• P(Cavity) = 0.1
- In the absence of any other information, there is a 10%
chance that somebody has a cavity.
• P(Cavity/Toothache) = 0.8
- There is a 80% chance that somebody has a cavity given
that he has a toothache.
22
Example
• A bit string of length four is generated at random so that
each of the 16 bit strings of length four is equally likely.
What is the probability that it contains at least two
consecutive 0s, given that its first bit is a 0?
23
Conditional Probability (cont’d)
• Note that if E and F are independent, then
Pr[E|F] = Pr[E]
• Using Pr[E|F] = Pr[EF]/Pr[F] we have:
Pr[EF] = Pr[E|F] Pr[F]
Pr[EF] = Pr[F|E] Pr[E]
• Combining them we get the Bayes rule:
Pr[ F | E ] Pr[ E ]
Pr[ E | F ]
Pr[ F ]
24
Bayes’s Theorem
• Useful for computing the probability that a
hypothesis H is correct, given data D:
Pr[ Hypothesis | Data]
Pr[ Data | Hypothesis] Pr[ Hypothesis ]
Pr[ Data]
• Extremely useful in artificial intelligence apps:
– Data mining, automated diagnosis, pattern recognition,
statistical modeling, evaluating scientific hypotheses.
25
Example
• Consider the probability of Disease given Symptoms
P(Disease/Symptoms) =
P(Symptoms/Disease)P(Disease) / P(Symptoms)
26
Random Variable
• A random variable V is a function (i.e., not a
variable) that assigns a value to each event !
27
Why Using Random Variables?
• It is easier to deal with a summary variable than with the
original probability structure.
• Example: in an opinion poll, we ask 50 people whether
agree or disagree with a certain issue.
–
–
–
–
–
Suppose we record a "1" for agree and "0" for disagree.
The sample space for this experiment has 250 possible elements.
Suppose we are only interested in the number of people who agree.
Define the variable X=“number of 1’s recorded out of 50”
Easier to deal with this sample space - has only 51 elements!
28
Random Variables
• How is the probability function of a r.v. being
defined from the probability function of the
original sample space?
– Suppose the original sample space is:
S={
s1 , s2 ,..., sn }
– Suppose the range of the r.v. X is:
{
x1 , x2 ,..., xm
}
– We will observe X= x j iff the outcome of the random
experiment is an s j S such that X( s j)= x j
P( X x j )
s j :X ( s j ) x j
P( s j )
29
Example
• X = “sum of dice”
– X=5 corresponds to E={(1,4) (4,1) (2,3) (3,2)}
P(X=5)=P(E)=
s: X ( s ) x
P( s)
=
P((1,4) + P((4,1)) + P((2,3)) + P((3,2)) = 4/36=1/9
30
Expectation Values
• For a random variable X having a numeric
domain, its expected value or weighted average
value or arithmetic mean value E[X] is defined as
E ( X ) xP( X x)
x
• Provides a central point for the distribution of
values of the r.v.
31
Example
• X=“outcome of rolling a die”
E(X) = ∑ x·P(X=x) = 1· 1/6 + 2 ·1/6 + 3· 1/6 +
4 ·1/6 + 5 ·1/6 + 6· 1/6 = 3.5
32
Another Example
• X=“sum of two dice”
P(X=2)=P(X=12)=1/36
P(X=4)=P(X=10)=3/36
P(X=6)=P(X=8)=5/36
P(X=3)=P(X=11)=2/36
P(X=5)=P(X=9)=4/36
P(X=7)=6/36
E(X) = ∑ x·P(X=x) =2·1/36 + 3·2/36 + ….
+ 12· 1/36 = 7
33
Linearity of Expectation
• Let X1, X2 be any two random variables
derived from the same sample space, then:
– E[X1+X2] = E[X1] + E[X2]
– E[aX1 + b] = aE[X1] + b
34
Example
• Find the expected value of the sum of the numbers
that appear when a pair of fair dice is rolled.
X1: “number appearing on first die”
X2: “number appearing on second die”
Y = X1 + X2 : “sum of dice”
E(Y) = E(X1+X2)=E(X1)+E(X2) = 3.5+3.5=7
35
Variance
• The variance Var[X] = σ2(X) of a r.v. X is
defined as:
Var ( X ) E (( X )2 ) E ( X 2 ) 2
where E ( X )
• The standard deviation or root-mean-square
(RMS) difference of X, σ(X) :≡ Var[X]1/2.
• Indicates how spread out the values of the r.v.
are.
36
Example
• X=“outcome of rolling a die”
E(X) = ∑ x·P(X=x) = 1· 1/6 + 2 ·1/6 + 3· 1/6 +
4 ·1/6 + 5 ·1/6 + 6· 1/6 = 3.5
Var ( X ) E (( X )2 )
=
(1 3.5)2 1/ 6 (2 3.5)2 1/ 6 ... (6 3.5)2 1/ 6 35 /12
37
Properties of Variance
Var (aX b) a 2Var ( X )
• Let X1, X2 be any two independent random
variables derived from the same sample space,
then:
Var[X1+X2] = Var[X1] + Var[X2]
38
Example
• Suppose two dice are rolled and X is a r.v.
X = “sum of dice”
• Find Var(X)
39