Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Random Variable • A random variable X is a function that assign a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. • Domain of the random variable -- S • Range of the random variable -- Sx • Example 1: Suppose that a coin is tossed 3 times and the sequence of heads and tails is noted. Sample space S={HHH,HHT,HTH,HTT,THH,THT,TTH, TTT} X :number of heads in three coin tosses. ζ : HHH HHT HTH THH HTT THT TTH TTT X(ζ): 3 2 2 2 1 1 1 0 Sx={0,1,2,3} 1 Probability of random variable • Example 2: The event {X=k} ={k heads in three coin tosses} occurs when the outcome of the coin tossing experiment contains k heads. P[X=0]=P[{TTT}]=1/8 P[X=1]=P[{HTH}]+P[{THT}]+P[{TTH}]=3/8 P[X=2]=P[{HHT}]+P[{HTH}]+P[{THH}]=3/8 P[X=3]=P[{HHH}]=1/8 • Conclusion: B⊂SX A={ζ: X(ζ) in B} P[B]=P[A]=P[ζ: X(ζ) in B]. Event A and B are referred to as equivalent events. All numerical events of practical interest involves {X=x} or {X in I} 2 Events Defined by Random Variable • If X is a r.v. and x is a fixed real number, we can define the event (X=x) as (X=x)={ζ: X(ζ)=x)} (X=x)={ζ: X(ζ)=x)} (X=x)={ζ: X(ζ)=x)} (x1<X≤x2)={ζ: x1<X(ζ)≤x2} These events have probabilities that are denoted by P[X=x]=P{ζ: X(ζ}=x} P[X=x]=P{ζ: X(ζ}=x} P[X=x]=P{ζ: X(ζ}=x} P[x1<X≤x2]=P{ζ: x1<X(ζ)≤x2} 3 Distribution Function The cumulative distribution function (cdf) of a random variable X is defined as the probability of events {X ≤ x}: Fx(x)=P[X ≤ x] for -∞< x ≤ +∞ In terms of underlying sample space, the cdf is the probability of the event {ζ: X(ζ)≤x}. • Properties: 0 FX 1 lim FX ( x ) 1 x lim FX ( x ) 0 x a b FX ( a ) FX (b) FX (b) lim FX (b h) FX (b ) h 0 P[ a x b] FX (b) FX ( a ) P[ X b] FX (b) FX (b ) P[ X b] 1 FX (b) 4 A typical example of cdf • Tossing a coin 3 times and counting the number of heads x -1 0 1 2 3 4 X≤x Ø {TTT} {TTT,TTH,THT,HTT} {TTT,TTH,THT,HTT,HHT,HTH,THH} S S FX(x) 0 1/8 4/8 7/8 1 1 5 Two types of random variables • A discrete random variable has a countable number of possible values. X: number of heads when trying 5 tossing of coins. The values are countable • A continuous random variable takes all values in an interval of numbers. X: the time it takes for a bulb to burn out. The values are not countable. 6 Example of cdf for discrete random variables • Consider the r.v. X defined in example 2. 7 Discrete Random Variable And Probability Mass Function • Let X be a r.v. with cdf FX(x). If FX(x) changes value only in jumps and is constant between jumps, i.e. FX(x) is a staircase function, then X is called a discrete random variable. • Suppose xi < xj if i<j. P(X=xi)=P(X≤xi) - P(X≤xj)= FX(xi) - FX(xi-1) Let px(x)=P(X=x) The function px(x) is called the probability mass function (pmf) of the discrete r.v. X. • Properties of px(x): 0 p X ( xk ) 1 k 1,2,... p X ( x) 0 p k X x xk (k 1,2,...) ( xk ) 1 8 Example of pmf for discrete r.v. • Consider the r.v. X defined in example 2. 9 Continuous Random variable and Probability Density function • Let X be a r.v. with cdf FX(x) . If FX(x) is continuous and also has a derivative dFX(x) /dx which exist everywhere except at possibly a finite number of points and is piecewise continuous, then X is called a continuous random variable. • Let dF ( x) f X ( x) X dx • The function fX(x) is called the probability density function (pdf) of the continuous r.v. X . fX(x) is piecewise continuous. • Properties: f X ( x) 0 f X ( x)dx 1 b P(a X b) f X ( x)dx a 10 Conditional distribution • Conditional probability of an event A given event B is defined as P( A | B) • P( A B) P( B) Conditional cdf FX(x|B) of a r.v. X given event B is defined as P{( X x) B} P( B) If X is discrete, then the conditional pmf pX(x|B) is defined by P{( X xk ) B} p X ( xk | B) P( X xk | B) P( B) If X is continuous r.v., then the conditional pdf fX(x|B) is defined by FX ( x | B) P( X x | B) • • dFX ( x | B) f X ( x | B) dx 11 Mean and variance • Mean: The mean (or expected value) of a r.v. X, denoted by μX or E(X), is defined by xk p X ( xk ) X E ( X ) k xf X ( x)dx • Moment: The nth moment of a r.v. X is defined by xkn p X ( xk ) E ( X n ) k x n f X ( x)dx • Variance: The variance of a r.v. X, denoted by σX2 or Var(X), is defined by X2 Var ( X ) E{[ X E ( X )]2 } E ( X 2 ) [ E ( X )]2 ( x k X ) 2 p X ( xk ) X2 k ( x X ) 2 f X ( x)dx 12 Expectation of a Function of a Random variable • Given a r.v. X and its probability distribution (pmf in the discrete case and pdf in the continuous case), how to calculate the expected value of some function of X, E(g(X))? • Proposition: (a) If X is a discrete r.v. with pmf pX(x), then for any real-valued function g, E[ g ( X )] g ( xk ) p( xk ) k (b) If X is a continuous r.v. with pdf fX(x), then for any real-valued function g, E[ g ( X )] g ( x) f ( x)dx 13 Limit Theorem • Markov's Inequality: If X is a r.v. that takes only nonnegative values, then for any value a>0, E[ X ] P{ X a} a • Chebyshev's Inequality: If X is a random variable with mean μ and variance σ2, then for any value k>0 P{ X k} 2 k 2 14 Application of Limit theorem • Suppose we know that the number of items produced in a factory during a week is a random variable with mean 500. • (a) What can be said about the probability that this week's production will be at least 1000? • (b) If the variance of a week's production is known to equal 100, then what can be said about the probability that this week's production will be between 400 and 600? • Solution: Let X be number of item that will be produced in a week. (a) By Markov's inequality, P{X≥1000}≤E[X]/1000=0.5 (b) By Chebyshev's inequality, P{|X-500|≥100}≤ σ2/(100)2=0.01 P {|X-500|<100}≥1-0.01=0.99. 15 Some Special Distribution • • • • • • • • Bernoulli Distribution Binomial Distribution Poisson Distribution Uniform Distribution Exponential Distribution Normal (or Gaussian) Distribution Conditional Distribution …… 16 Bernoulli Random Variable An experiment with outcome as either a "success" or as a "failure" is performed. Let X=1 if the outcome is a "success" and X=0 if it is a "failure". If the pmf is given as following, such experiments are called Bernoulli trials, X is said to be a Bernoulli random variable. S X {0,1} p X (1) p p X (0) 1 p X E[ X ] p X2 Var[ X ] p(1 p) Note: 0 ≤ p ≤ 1 Example: Tossing coin once. The head and tail are equally likely to occur, thus p=0.5. pX(1)=P(H)=0.5, pX(1)=P(T)=0.5. 17 Binomial Random Variable • Suppose n independent Bernoulli trails, each of which results in a "success" with probability p and in a "failure with probability 1-p, are to be performed. Let X represent the number of success that occur in the n trials, then X is said to be a binomial random variable with parameters (n,p). S X {0,1,2,...n} n k p X (k ) p (1 p) n k k 0,1,...n k X E[ X ] np X2 np(1 p) Example: Toss a coin 3 times, X=number of heads. p=0.5 pX (0) 0.125 pX (1) 0.375 pX (2) 0.375 pX (3) 0.125 18 Geometric Random Variable • Suppose the independent trials, each having probability p of being a success, are performed until a success occurs. Let X be the number of trails required until the first success occurs, then X is said to be a geometric random variable with parameter p. S X {1,2,...} p X (k ) p(1 p) k 1 k 1,2,... 1 (1 p) 2 X E[ X ] X p p2 Example: Consider an experiment of rolling a fair die. The average number of rolls required in order to obtain a 6: 1 1 X E( X ) 6 p 1/ 6 19 Poisson Random Variable • A r.v. X is called a Poisson random variable with parameter λ(>0) if its pmf is given by p X (k ) P( X k ) e k k! k 0,1,2,... S X {0,1,2,...} X E[ X ] X2 An important property of the Poisson r.v. is that it may be used to approximate a binomial r.v. when the binomial parameter n is large and p is small. Let λ=np P( X k ) e k k! 20 Uniform Random Variable S X [ a, b] 1 f X ( x) ba a xb 2 ab ( b a ) X E[ X ] X2 2 12 A uniform r.v.X is often used when we have no prior knowledge of the actual pdf and all continuous values in some range seem equally likely. 21 Exponential Random Variable S X [0,) f X ( x ) e x x 0, 0 FX ( x) 1 e x X E[ X ] 1 2 X 1 2 The most interesting property of the exponential r.v. is "memoryless". P( X x t | X t ) P( X x) x, t 0 X can be the lifetime of a component. 22 Gaussian (Normal) Random Variable • S X (,) f X ( x) e ( x ) 2 / 2 2 x , 0 2 X E[ X ] X2 2 An important fact about normal r.v. is that if X is normally distributed with parameter μ and σ2, then Y=aX+b is normally distributed with paramter a μ+b and (a2 σ2); Application: central limit theorem-- the sum of large number of independent r.v.'s,under certain conditions can be approximated b a normal r.v. denoted by N(μ;σ2) 23 The Moment Generating Function • etxk p( xk ) k tX (t ) E[e ] tx e f ( x)dx The important property: All of the moment of X can be obtained by successively differentiation. n (0) E[ X n ] 24 Application of Moment Generating Function • The Binomial Distribution (n,p) n k (t ) E[e ] e p (1 p) n k k 0 k (t ) n( pet 1 p) n1 pet E[ X ] (0) np n tX tk 25 Entropy • Entropy is a measure of the uncertainty in a random experiment. • Let X be a discrete r.v. with SX={x1,x2, …,xk} and pmf pk=P[X=xk]. Let Ak denote the event {X=xk}. Intuitive facts: the uncertainty of Ak is low if pk is close to one, and it is high if pk is close to zero. Measure of uncertainty: 1 I ( X xk ) ln ln P[ X xk ] P[ X xk ] 26 Entropy of a random variable • The entropy of a r.v. X is defined as the expected value of the uncertainty of its outcomes: 1 H X E[ I ( X )] p( xk ) ln p ( xk ) ln p ( xk ) p ( xk ) k k The entropy is in units of ''bits'' when the logarithm is base 2 Independent fair coin flips have an entropy of 1 bit per flip. A source that always generates a long string of A's has an entropy of 0, since the next character will always be an 'A'. 27 Entropy of Binary Random Variable • • • • Suppose r.v. X with Sx={0,1}, p=P[X=0]=1-P[X=1]. (Flipping a coin). The HX=h(p) is symmetric about p=0.5 and achieves its maximum at p=0.5; The uncertainty of event (X=0) and (X=1) vary together in complementary manner. The highest average uncertainty occurs when p(0)=p(1)=0.5; 28 Reduction of Entropy Through Partial Information • Entropy quantifies uncertainty by the amount of information required to specify the outcome of a random experiment. • Example: If r.v. X equally likely takes on the values from set {000,001,010,…,111} (Flipping coins 3 times), given the event A={X begins with a 1}={100,101,110,111}, what is the change of entropy of r.v.X ? 1 1 1 1 1 1 H X log 2 log 2 ... log 2 3bits 8 8 8 8 8 8 1 1 1 1 H X | A log 2 ... log 2 2bits 4 4 4 4 29 Thanks! Question? 30 Extending discrete entropy to the continuous case: differential entropy • Quantization method: Let X be a continuous r.v. that takes on values in the interval [a b]. Divide [a b] into a large number K of subintervals of length ∆. Let Q(X) be the midpoint of the subinterval that contains X. Find the entropy of Q. • Let xk be the midpoint of the kth subinterval, then P[Q= xk]=P[X is in kth subinterval]=P[xk-∆/2<X< xk+∆/2]≈ fX(xk) ∆ • K H Q P[Q xk ] ln P[Q xk ] k 1 K K k 1 k 1 H Q f X ( xk ) ln( f X ( xk )) ln f X ( xk ) ln f X ( xk ) Trade off: ∆→0, HQ→∞ Differential Entropy is defined as H X f X ( x) ln f X ( x)dx E[ln f X ( x)] 31 The Method of Maximum Entropy The maximum entropy method is a procedure for estimating the pmf or pdf of a random variable when only partial information about X, in the form of expected values of functions of X, is available. Discrete case: X being a r.v. with Sx={x1,x2,…,xk} and unknown pmf px(xk). Given the expected value of some function g(X) of X: K g(x ) p k 1 k X ( xk ) c 32