Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Basic Concepts of Discrete Probability Elements of the Probability Theory (continuation) 1 Bayes’ Theorem • Bayes’s theorem (1763) solves the following problem: from a number of observations on the occurrence of the effect, one can make an estimate on the occurrence of the cause leading to that effect (is is also called the rule of inverse probability). 2 Bayes’ Theorem • Let A1 and A2 be two mutually exclusive and exhaustive events: A1 A2 A1 A2 U • Let both A1 and A2 have a subevent ( EA1 and EA2 , respectivelly. • The event E EA1 EA2 is of our special interest. It can occur only when A1 and A2 occurs. 3 Bayes’ Theorem • Let we are given a priori information that E has occurred and the conditional probabilities P{E|A1} and P{E|A2} (a priori probabilities) are assumed to be known. • The Bayes’s problem is formulated as follows: how likely is that A1 and A2 has occurred because of the occurrence of E (a posteriori probabilities)? 4 Bayes’ Theorem P{ A1} 1 P{ A2 } 2 A1 A2 U ; A1 A2 P E | A1 p1 ; P E | A2 p2 P A1 | E ? P A2 | E ? a priori probabilities a posteriori probabilities 5 Bayes’ Theorem • Since E EA1 EA2 , then for the mutually exclusive events: P{E} P{EA1} P{EA2 } P{ A1}P{E | A1} P{ A2 }P{E | A2 } 1 p1 P A1 E P A1 P E | A1 1 p1 P A1 | E P E P{ A1}P{E | A1} P{ A2 }P{E | A2 } 1 p1 2 p2 1 p1 2 p2 2 p2 P A2 E P A2 P E | A2 2 p2 P A2 | E P E P{ A1}P{E | A1} P{ A2 }P{E | A2 } 1 p1 2 p2 1 p1 2 p2 6 Bayes’ Theorem (general case) n • Let E Ak ; P E | Ak pk ; P Ak k , k 1,..., n k 1 • Then P Ak P E | Ak P Ak P E | Ak k pk P Ak | E n n P E P E | Ai P Ai i pi i 1 i 1 7 Bayes’ Theorem and Communication Channels Consider again the problem of sending a bit of information from sender to receiver: 0 1–p p 0 p 1 1–p 1 Before a bit b{0,1} has been transmitted, the receiver has no information: p(0) = p(1) = ½. The transmission of the bit value changes these probabilities: If the bit value b’=0 has been received, we assign a higher probability that the bit value was b=0, rather than b=1. This probability is calculated using the Bayes’ theorem. 8 Bayes’ Theorem and Communication Channels Y X Let us apply Bayes’ theorem to the noisy channel where the sender’s bit is the random variable X, and the received bit is Y. 0 1 0.9 0.1 0.9 0 0.1 1 1) Take p=0.1 and use the channel without error correction. We have that P{X=0|Y=0} = P{X=1|Y=1} = 0.9 and P{X=1|Y=0} = P{X=0|Y=1} = 0.1. 2) if we use the code where we send the bits 3 times, we get P{X=0|Y=0} = P{X=1|Y=1} = 0.972 and P{X=1|Y=0} = P{X=0|Y=1} = 0.028. Thus, the information given by the Bayes’ posterior distributions P{X|Y}, is anyway less random than (½,½). 9 Random Variables • A random variable is a real-valued function defined over the sample space of a random experiment: X : ; R • A random variable is called discrete if its range is either finite or countable infinite. • A random variable establishes the correspondence between a point of Ω and a point of in the “coordinate space” associated with the corresponding experiment. 10 Discrete Probability Function and Distribution • Any discrete random variable X assumes different values in the coordinate space: x1 , x2 ,..., xn ,... • The probability distribution function (the cumulative distribution function - CDF) is defined as F ( x) f ( xi ) xi x where f ( xk ) P X xk pk is the probability function 11 Discrete Probability Function •Thus, the discrete random variable X ‘produces’ letters x from a countable (typically finite) alphabet Ψ with the probability function p(x): f(x) = P{X=x} with x Ψ x f(x’) = P{X=x’} with x’ Ψ x’ X x’’ f(x’’) = P{X=x’’} with x’’ Ψ 12 Discrete Probability Distribution Function (CDF) • The following properties of the CDF follow from the axioms of probability: • F(x) is nondecreasing function: if x1 x2 then F ( x1 ) F ( x2 ) F ( x) 1; lim F ( x) 0 • lim x x • P xi X x j F ( x j ) F ( xi ) for every xi x j 13 Bivariate Discrete Distribution • In most engineering problems the interrelation between two random quantities (pairs of values x j , yk - a vector-valued random variable) leads to a bivariate discrete distribution. • The joint probability function and distribution function (CDF) are, respectively: f ( x, y ) P X x, Y y F ( x, y ) P X x, Y y 14 Bivariate Discrete Distribution • The marginal probability function and distribution function (CDF) are, respectively: f1 ( xi ) P X xi , all permissible Y ' s f xi , y y f 2 ( yi ) P Y yi , all permissible X ' s f x, yi F1 ( xi ) F2 ( yi ) x f1 ( xk ) f 2 ( yk ) xk xi yk yi 15 Bivariate Discrete Distribution • The marginal probability f1 ( x1 ) is the probability of the occurrence of those events, for which X xi without regard to the value of Y. • If the random variables X and Y are such that for all i, j i, j f ( xi , y j ) P X xi , Y y j f1 ( xi ) f 2 ( y j ) then the variables X and Y are said to be statistically independent. 16 Combinatorics and Probability • For example, if engineering students have today Calculus (C), Physics (P), and Information Theory (I) classes. How we can calculate the probability that I is the last class? • The following 6 arrangements are possible: CPI, CIP, PCI, PIC, ICP, IPC. Two of them are desirable: CPI and PCI. Thus, if all events are equiprobable, then the probability is 2/6=1/3. 17 Combinatorics and Probability • If engineering students take during this semester Calculus (C), Physics (P), and Information Theory (I) classes, two classes/day. How we can calculate the probability that I and P are taken at the same day and P is the first class? • There are 6 different arrangements of 2 objects selected from 3: CP, PC, CI, IC, IP, PI. One of them is desirable: PI. Thus, the probability is 1/6. 18 Combinatorics and Probability • The number of different permutations of n objects is P ( n) n ! • The number of different (ordered) arrangements of r objects selected from n is the number of all possible permutations of n objects (n!) divided by the number of all possible permutations of n-r objects ((n-r)!): n! n Ar (n r )! 19 Combinatorics and Probability • If engineering students take during this semester Calculus (C), Physics (P), and Information Theory (I) classes, two classes/day. How we can calculate the probability that I and P are taken at the same day? • There are 3 different combinations of 2 objects selected from 3: (CP=PC), (CI=IC), (IP=PI). One of them is desirable: (IP=PI). Thus, the probability is 1/3. 20 Combinatorics and Probability • The number of different (not ordered) combinations of r objects selected from n is the number of all possible arrangements of r objects selected from n Arn divided by the number of all possible permutations of r objects (r!): n A n! n r Cr r ! r !(n r )! 21 Combinatorics and Probability • Binomial Meaning: as it was discovered by n I. Newton, Cr , r 0,..., n are the coefficients of the binomial decomposition: n n n n 1 n n 2 2 a b C a C a b C b ... 0 1 2a n ... C a n r nr b ... C b r n n 0 22 Binomial Distribution • Let a random experiment has only two possible outcomes E1 and E2. Let the probability of their occurrence be p and q=1-p, respectively. If the experiment is repeated n times and two successive trials are independent of each other, the probability of obtaining E1 and E2 r and n-r times, respectively, is C n p r q n r r 23 Binomial Distribution • Let a random variable X takes the values r if in a sequence of n trials E1 occurs exactly r times. Then f ( r ) P X r C p q n r [ x] r nr F ( x) P X x Crn p r q n r r 0 The probability function The probability distribution function (CDF) – the binomial distribution function 24 Poisson’s Distribution • A random variable X is said to have a Poison probability distribution if P X x e x x! ; 0; x 0,1, 2,... • The Poisson’s probability distribution function (CDF) is [ x] F ( x) e k k! F ( x) 0, x 0 ,x 0 k 0 25 Expected Value of a Random Variable • Let X be a discrete single-variate random variable and its associated probability function is also defined: x1 , x2 ,..., xn p1 , p2 ,..., pn n • Then X pk xk is the average (statistical k 1 average) of X. 26 Expected Value of a Random Variable • In general, if ( x ) is a function of a random variable X (a weighting function), then its mean value n ( x) pk ( xk ) k 1 is referred to as the expected value. • E(X) is the expected value of X, E(X+Y) is the expected value of X+Y. 27 Expected Value of a Random Variable • When the function ( x ) is of the form ( x) X j where j>0, its expected value is called the moment of jth order of X. • E X X - first order moment (mean) 2 2 E X X • - second order moment • ……… j j E X X • - jth order moment 28 Basic Concepts of Information Theory A measure of uncertainty. Entropy. 29 The amount of Information • How we can measure the information content of a discrete communication system? • Suppose we consider a discrete random experiment and its sample space Ω. Let X be a random variable associated with Ω. If the experiment is repeated a large number of times, the values of X when averaged will approach E(X). 30 The amount of Information • Could we search for some numeric characteristic associated with the random experiment such that it provides a “measure” of surprise or unexpectedness of occurrence of outcomes of the experiment? 31 The amount of Information • C. Shannon has suggested that the random variable –log P{Ek} is an indicative relative measure of the occurrence of the event Ek. The mean of this function is a good indication of the average uncertainty with respect to all outcomes of the experiment. 32 The amount of Information • Consider the sample space Ω. Let us partition the sample space in a finite number of mutually exclusive events: E E1 , E2 ,..., En ; n Ei U i 1 n P p1 , p2 ,..., pn ; pi 1 i 1 • The way in which the probability space defined by such equations is called a complete finite scheme. 33 The amount of Information. Entropy. • Our task is to associate a measure of uncertainty (a measure of “surprise”), H ( p1 , p2 ,..., pn ) with complete finite schemes. • C. Shannon and N. Wiener suggested the following measure of uncertainty – the Entropy: n H ( X ) pi log pi i 1 34 Entropy of a Bit (a simple communication channel) •A completely random bit with p=(½,½) has H(p) = –(½ log ½ + ½ log ½) = –(–½ + –½) = 1. •A deterministic bit with p=(1,0) has H(p) = –(1 log 1 + 0 log 0) = –(0+0) = 0. •A biased bit with p=(0.1,0.9) has H(p) = 0.468996… •In general, the entropy looks as follows as a function of 0≤P{X=1}≤1: 35 The amount of Information. Entropy. • We have to investigate the principal properties of this measure with respect to statistical problems of communication systems. • We have to generalize this concept to twodimensional probability schemes. • Then we have to consider the n-dimensional probability schemes. 36