Download Basic Concepts of Discrete Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Basic Concepts of Discrete
Probability
Elements of the Probability Theory
(continuation)
1
Bayes’ Theorem
• Bayes’s theorem (1763) solves the following
problem: from a number of observations on
the occurrence of the effect, one can make an
estimate on the occurrence of the cause
leading to that effect (is is also called the rule
of inverse probability).
2
Bayes’ Theorem
• Let A1 and A2 be two mutually exclusive and
exhaustive events:
A1 A2  
A1 A2  U
• Let both A1 and A2 have a subevent ( EA1
and EA2 , respectivelly.
• The event E  EA1 EA2 is of our special
interest. It can occur only when A1 and A2
occurs.
3
Bayes’ Theorem
• Let we are given a priori information that E
has occurred and the conditional probabilities
P{E|A1} and P{E|A2} (a priori probabilities)
are assumed to be known.
• The Bayes’s problem is formulated as follows:
how likely is that A1 and A2 has occurred
because of the occurrence of E (a posteriori
probabilities)?
4
Bayes’ Theorem
P{ A1}  1
P{ A2 }  2
A1
A2  U ; A1 A2  
P  E | A1  p1 ; P  E | A2   p2
P  A1 | E  ? P  A2 | E  ?
a priori
probabilities
a posteriori
probabilities
5
Bayes’ Theorem
• Since E  EA1 EA2 , then for the mutually
exclusive events:
P{E}  P{EA1}  P{EA2 } 
 P{ A1}P{E | A1}  P{ A2 }P{E | A2 }
1
p1
P  A1 E
P  A1 P E | A1
1 p1
P  A1 | E 


P  E
P{ A1}P{E | A1}  P{ A2 }P{E | A2 } 1 p1  2 p2
1 p1
2 p2
2
p2
P  A2 E
P  A2  P E | A2 
2 p2
P  A2 | E 


P  E
P{ A1}P{E | A1}  P{ A2 }P{E | A2 } 1 p1  2 p2
1 p1
2 p2
6
Bayes’ Theorem
(general
case)
n
• Let E  Ak ; P E | Ak   pk ; P  Ak   k , k  1,..., n
k 1
• Then
P  Ak  P E | Ak 
P  Ak  P E | Ak 
k pk
P  Ak | E 
 n
 n
P E
 P E | Ai  P  Ai   i pi
i 1
i 1
7
Bayes’ Theorem
and Communication Channels
Consider again the problem
of sending a bit of information
from sender to receiver:
0
1–p
p
0
p
1
1–p
1
Before a bit b{0,1} has been transmitted,
the receiver has no information: p(0) = p(1) = ½.
The transmission of the bit value changes
these probabilities: If the bit value b’=0 has been
received, we assign a higher probability that the
bit value was b=0, rather than b=1.
This probability is calculated using the Bayes’ theorem.
8
Bayes’ Theorem
and Communication Channels
Y
X
Let us apply Bayes’ theorem to the noisy
channel where the sender’s
bit is the random variable X,
and the received bit is Y.
0
1
0.9
0.1
0.9
0
0.1
1
1) Take p=0.1 and use the channel without error correction.
We have that P{X=0|Y=0} = P{X=1|Y=1} = 0.9
and P{X=1|Y=0} = P{X=0|Y=1} = 0.1.
2) if we use the code where we send the bits 3 times,
we get P{X=0|Y=0} = P{X=1|Y=1} = 0.972
and P{X=1|Y=0} = P{X=0|Y=1} = 0.028.
Thus, the information given by the Bayes’ posterior
distributions P{X|Y}, is anyway less random than (½,½).
9
Random Variables
• A random variable is a real-valued function
defined over the sample space of a random
experiment: X :   ;   R
• A random variable is called discrete if its range
is either finite or countable infinite.
• A random variable establishes the
correspondence between a point of Ω and a
point of in the “coordinate space” associated
with the corresponding experiment.
10
Discrete Probability Function
and Distribution
• Any discrete random variable X assumes
different values in the coordinate space:
x1 , x2 ,..., xn ,...
• The probability distribution function (the
cumulative distribution function - CDF) is
defined as F ( x)   f ( xi )
xi  x
where
f ( xk )  P  X  xk   pk
is the probability function
11
Discrete Probability Function
•Thus, the discrete random variable X
‘produces’ letters x from a
countable (typically finite) alphabet Ψ
with the probability function p(x):
f(x) = P{X=x} with x Ψ
x
f(x’) = P{X=x’} with x’ Ψ
x’
X
x’’
f(x’’) = P{X=x’’} with x’’ Ψ
12
Discrete Probability
Distribution Function (CDF)
• The following properties of the CDF follow
from the axioms of probability:
• F(x) is nondecreasing function: if x1  x2
then F ( x1 )  F ( x2 )
F ( x)  1; lim F ( x)  0
• lim
x 
x 
• P xi  X  x j  F ( x j )  F ( xi )
for every xi  x j


13
Bivariate Discrete Distribution
• In most engineering problems the
interrelation between two random quantities
(pairs of values  x j , yk  - a vector-valued
random variable) leads to a bivariate discrete
distribution.
• The joint probability function and distribution
function (CDF) are, respectively:
f ( x, y )  P  X  x, Y  y
F ( x, y )  P  X  x, Y  y
14
Bivariate Discrete Distribution
• The marginal probability function and
distribution function (CDF) are, respectively:
f1 ( xi )  P  X  xi , all permissible Y ' s   f  xi , y 
y
f 2 ( yi )  P Y  yi , all permissible X ' s   f  x, yi 
F1 ( xi ) 
F2 ( yi ) 
x

f1 ( xk )

f 2 ( yk )
xk  xi
yk  yi
15
Bivariate Discrete Distribution
• The marginal probability f1 ( x1 ) is the
probability of the occurrence of those events,
for which X  xi without regard to the value
of Y.
• If the random variables X and Y are such that
for all i, j  i, j  f ( xi , y j )  P  X  xi , Y  y j   f1 ( xi ) f 2 ( y j )
then the variables X and Y are said to be
statistically independent.
16
Combinatorics and Probability
• For example, if engineering students have
today Calculus (C), Physics (P), and
Information Theory (I) classes. How we can
calculate the probability that I is the last class?
• The following 6 arrangements are possible:
CPI, CIP, PCI, PIC, ICP, IPC. Two of them are
desirable: CPI and PCI. Thus, if all events are
equiprobable, then the probability is 2/6=1/3.
17
Combinatorics and Probability
• If engineering students take during this
semester Calculus (C), Physics (P), and
Information Theory (I) classes, two
classes/day. How we can calculate the
probability that I and P are taken at the same
day and P is the first class?
• There are 6 different arrangements of 2
objects selected from 3: CP, PC, CI, IC, IP, PI.
One of them is desirable: PI. Thus, the
probability is 1/6.
18
Combinatorics and Probability
• The number of different permutations of n
objects is P ( n)  n !
• The number of different (ordered)
arrangements of r objects selected from n is
the number of all possible permutations of n
objects (n!) divided by the number of all
possible permutations of n-r objects ((n-r)!):
n!
n
Ar 
(n  r )!
19
Combinatorics and Probability
• If engineering students take during this
semester Calculus (C), Physics (P), and
Information Theory (I) classes, two
classes/day. How we can calculate the
probability that I and P are taken at the same
day?
• There are 3 different combinations of 2
objects selected from 3: (CP=PC), (CI=IC),
(IP=PI). One of them is desirable: (IP=PI). Thus,
the probability is 1/3.
20
Combinatorics and Probability
• The number of different (not ordered)
combinations of r objects selected from n is
the number of all possible arrangements of r
objects selected from n Arn divided by the
number of all possible permutations of r
objects (r!):
n
A
n!
n
r
Cr 

r ! r !(n  r )!
 
21
Combinatorics and Probability
• Binomial Meaning: as it was discovered by
n
I. Newton, Cr , r  0,..., n are the
coefficients of the binomial decomposition:
n n
n n 1
n n 2 2
a

b

C
a

C
a
b

C
b  ...


0
1
2a
n
...  C a
n
r
nr
b  ...  C b
r
n n
0
22
Binomial Distribution
• Let a random experiment has only two
possible outcomes E1 and E2. Let the
probability of their occurrence be p and
q=1-p, respectively. If the experiment is
repeated n times and two successive trials are
independent of each other, the probability of
obtaining E1 and E2 r and n-r times,
respectively, is C n p r q n  r
r
23
Binomial Distribution
• Let a random variable X takes the values r if in
a sequence of n trials E1 occurs exactly r
times. Then
f ( r )  P  X  r  C p q
n
r
[ x]
r
nr
F ( x)  P  X  x   Crn p r q n r
r 0
The probability function
The probability
distribution function
(CDF) – the binomial
distribution function
24
Poisson’s Distribution
• A random variable X is said to have a Poison
probability distribution if
P  X  x  e 
x
x!
;   0; x  0,1, 2,...
• The Poisson’s probability distribution function
(CDF) is
[ x]
F ( x)   e  
k
k!
F ( x)  0, x  0
,x 0
k 0
25
Expected Value of
a Random Variable
• Let X be a discrete single-variate random
variable and its associated probability function
is also defined:  x1 , x2 ,..., xn 
 p1 , p2 ,..., pn 
n
• Then X   pk xk is the average (statistical
k 1
average) of X.
26
Expected Value of
a Random Variable
• In general, if  ( x ) is a function of a random
variable X (a weighting function), then its
mean value
n
 ( x)   pk ( xk )
k 1
is referred to as the expected value.
• E(X) is the expected value of X, E(X+Y) is the
expected value of X+Y.
27
Expected Value of
a Random Variable
• When the function  ( x ) is of the form
 ( x)  X j where j>0, its expected value is
called the moment of jth order of X.
• E  X   X - first order moment (mean)
2
2
E
X

X
•  
- second order moment
• ………
j
j
E
X

X
•  
- jth order moment
28
Basic Concepts of
Information Theory
A measure of uncertainty. Entropy.
29
The amount of Information
• How we can measure the information content
of a discrete communication system?
• Suppose we consider a discrete random
experiment and its sample space Ω. Let X be a
random variable associated with Ω. If the
experiment is repeated a large number of
times, the values of X when averaged will
approach E(X).
30
The amount of Information
• Could we search for some numeric
characteristic associated with the random
experiment such that it provides a “measure”
of surprise or unexpectedness of occurrence
of outcomes of the experiment?
31
The amount of Information
• C. Shannon has suggested that the random
variable –log P{Ek} is an indicative relative
measure of the occurrence of the event Ek.
The mean of this function is a good indication
of the average uncertainty with respect to all
outcomes of the experiment.
32
The amount of Information
• Consider the sample space Ω. Let us partition
the sample space in a finite number of
mutually exclusive events:
 E    E1 , E2 ,..., En ;
n
Ei  U
i 1
n
 P    p1 , p2 ,..., pn  ;  pi  1
i 1
• The way in which the probability space
defined by such equations is called a complete
finite scheme.
33
The amount of Information.
Entropy.
• Our task is to associate a measure of
uncertainty (a measure of “surprise”),
H ( p1 , p2 ,..., pn ) with complete finite
schemes.
• C. Shannon and N. Wiener suggested the
following measure of uncertainty – the
Entropy:
n
H ( X )   pi log pi
i 1
34
Entropy of a Bit (a simple
communication channel)
•A completely random bit with p=(½,½) has
H(p) = –(½ log ½ + ½ log ½) = –(–½ + –½) = 1.
•A deterministic bit with p=(1,0) has
H(p) = –(1 log 1 + 0 log 0) = –(0+0) = 0.
•A biased bit with p=(0.1,0.9) has H(p) =
0.468996…
•In general, the entropy
looks as follows as a
function of 0≤P{X=1}≤1:
35
The amount of Information.
Entropy.
• We have to investigate the principal
properties of this measure with respect to
statistical problems of communication
systems.
• We have to generalize this concept to twodimensional probability schemes.
• Then we have to consider the n-dimensional
probability schemes.
36