Download L1: Lecture notes: intro, combinatorics and conditional probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction Probability
Sample Space S: the collection of
all possible outcomes of a stochastic
experiment (the outcome will be
determined by chance, coincidence)
s is an outcome in S:
s S (x is an element of S)
Event A: a subset of (outcomes of)
the sample space S.
Mathematical notation: A S
The probability P(A) of an event A
can be interpreted as the relative
frequency of the event A, when we
repeat the experiment very often.
Laws of probability:
1. 0 ≤ P(A) ≤ 1: a probability is
always between 0% and 100%.
1
2. P(S) = 1: the probability of all
outcomes is 1.
Finite sample space:
If 1. S consists of a finite number of
outcomes and 2. every outcome in S
is equally likely to occur,
then the probability of the event A(
S) can be computed by counting the
number of outcomes in A and S,
respectively (the Laplace definition):
P( A) 
nr. of outcomes in A  "favorable" 
=

nr. of outcomes in S 
"total" 
Complement A (or AC) of an event
A (say “not-A” or “A does not
occur”):
A is the set of all outcomes not in A.
2
Complement law: P( A ) = 1 – P(A)
S
A
A
The Intersection A and B:
collection of all common outcomes
of A and B. Notation: A∩B or AB
S
A
A∩B
B
Union (A or B): A occurs or B
occurs or both occur
Notation: A B
3
General addition law
P (A B) = P (A) + P (B) – P (A∩B )
Mutually exclusive (disjunct)
events A and B do not have common
outcomes: the intersection is empty,
so that P(A∩B)= 0.
Addition law for mutually
exclusive events:
P(A B) = P(A) + P(B)
S
A
B
The relative frequency as an
intuitive definition of probability:
if we repeat an experiment, under the
same conditions, n times and the
4
event A occurs n(A) times, then the
probability of A can be estimated
with:
. Intuitive property of this
relative frequency (frequency ratio):
This experimental law of large
numbers cannot be proven formally.
But the relative frequency has the
same properties as the probability P.
The theoretical background of
probabilities:
the axioms of Kolmogorov
The probability P can be seen as a
function that assigns a real (non5
negative) number to every event A
P:A→R
Kolmogorov stated his axioms for
this probability measure P:
1. P(A) ≥ 0 , for every event A
(
S):
2. P(S) = 1
3. For every sequence mutually
exclusive events A1 , A2,…


P  Ai    P ( Ai )
i  i
All the desirable properties of the
probability measure P can be proven,
only using the axioms, e.g.:
1. P (  ) = 0
(  is the null set or empty event)
2. If A B, then P(A) ≤ P(B)
6
3. P(A B) = P(A) + P(B) –
P(A∩B)
A conditional probability P(.|B) is
also a probability measure as defined
by Kolmogorov (if P(B) > 0)
Combinatorics, the art of counting:
“Laplace”: for finite equally likely
outcomes is P(A) = #A / #S
Basic principle of counting: if an
experiment has m outcomes and for
each of these outcomes a second
experiment has n outcomes, then all
together there are mn outcomes.
7
n (different) elements have
n! = n(n-1)…321 rankings
(orders, permutations).
The number of permutations of r
elements chosen out of n elements,
without replacement, is
n(n-1)…(n – r +1) = n!/(n-r)!
Permutations with replacement: nr
The number of combinations of r
elements chosen out of n elements,
without replacement, is
8
the binomial coefficient (“n over r”)
Combinatorics of random samples of
size r chosen out of n elements:
Replacement?
No
Yes
n!/(n-r)!
Ordered
nr
permutations
Outcomes Not-ordered combinations
In the last case the outcomes are not
equally likely.
The number of divisions of n
elements in k distinct groups of size
n1, …, nk (n1 +…+ nk = 1) is:
9
The multinomial coefficient
The probability of r red balls if we
choose at random n balls out of a
vase containing R red balls and N-R
white balls (N balls in total) is:
The hypergeometric formula
Conditional probabilities:
The conditional probability of A,
under the condition B has occurred,
is defined as:
(provided that P(B)  0)
10
A conditional probability P(.|B) is
also a probability measure as defined
by Kolmogorov (if P(B) > 0)
The multiplication rule follows
from this definition::
P(A∩B) = P(A|B)  P(B)
and
P(A∩B) = P(B|A)  P(A)
Multiplication rule for 3 events (can
be extended!):
P(A∩B∩C) =
P(A)  P(B|A)  P(C|A∩B)
Consider the following situation
 The population consists of 2
parts A and A
11
 We know the probability P(A)
( and P( A ) = 1 - P(A)!)
 We know P(B|A) en P(B| A )
In a Venn-diagram:
A
A∩B
A
A ∩B
B
Note that it follows from the product
law that P(A∩B) = P(B|A)P(A)
And
P( A ∩B) = P(B| A )P( A )
Law of total probability:
P(B) = P(A∩B) + P( A ∩B)
=>P(B)= P(A)P(B|A) + P( A )P(B| A )
12
Bayes rule:
P( A  B )
P( A | B ) 
P( B )
P( B | A) P( A)

P( B | A) P( A)  P( B | A) P( A)
In a “probability tree”:
13
A occurs?
B occurs?
P(B|A)
P(A)
P( A )
Yes
P( B |A)
Yes
No
P(B| A )
Yes
P( B | A )
No
No
So:
P(B)= P(A)×P(B|A) + P( A )×P(B| A )
--------------------------------------------
Generalization to n mutually
exclusive parts A1,…, A n of S:
14
Law of total probability/ Bayes rule
n
P ( B )   P ( Ai ) P ( Ai | B ) and
i 1
P ( A1 | B ) 
A1
P ( B | A1 ) P ( A1 )
in1 P ( B | Ai ) P ( Ai )
A2
A1∩B A2∩B
S
An
An∩B
B
A and B are called independent
when knowledge about the
occurrence of A does not influence
the probability of B (and vice versa)
15
Definition
A and B are independent 
P(A∩B) = P(A)×P(B)
The independence of A en B implies,
for example, that
P(B|A) = P(B) and P(A|B) = P(A)
Three events A, B and C are
independent iff.
P(A∩B∩C) = P(A)×P(B)×P(C)
Pair wise
P(A∩B) = P(A)×P(B)
indepenP(A∩C) = P(A)×P(C)
dence
P(B∩C) = P(B)×P(C)
}
This definition is extendable to n
independent events.
Sometimes we use the definition to
proof the independence, but usually
we can assume independence,
16
because of the nature of a situation
and use the formulas above.
When 2 experiments can be assumed
independent, then 2 events with
respect to either of the experiments
are independent.
Bernoulli trials are repeated
independent experiments with only
two outcomes “success” and
“failure” (probabilities p and 1-p )
If we repeat the Bernoulli trials until
the first success occurs, we find the
geometrical formula:
P(first success = kth trial)= (1-p)k-1p
this formula applies for all k = 1, 2,...
17
Related documents