Download Lecture 15

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Expectations of Random Variables,
Functions of Random Variables
ECE 313
Probability with Engineering Applications
Lecture 15
Ravi K. Iyer
Dept. of Electrical and Computer Engineering
University of Illinois at Urbana Champaign
Iyer - Lecture 16
ECE 313 - Fall 2016
Today’s Topics
•
•
•
•
•
Midterm Exam Statistics
review Erlang, Gamma and Hyper epenential distributions
Example of a failure detector with detection latency
Start on Expectations
– Expectations of important random variables
– Moments: Mean and Variance
Announcements
– Homework 7.
– Based on your Midterm exam, individual problems are assigned to you
to solve. Checkout Compass.
– Midterm available today. The grades are posted on Compass.
– In-Class activity replaced by a brief concept quiz on Wednesday
– Mini Project 2 grades posted this week
– Final project dates will be announced soon
Iyer - Lecture 16
ECE 313 - Fall 2016
Midterm Histogram
Iyer - Lecture 16
MIN
MAX
33
105
MEDIAN
MEAN
85
84.6
ECE 313 - Fall 2016
Exam Statistics
Midterm Exam
Q1
Q2
Q3
Q4
Bonus
Total
Median
34
15
20
17
5
85
Average
33.3
14.2
16.8
15.5
4.8
84.6
Min
15
0
4
5
1
33
Max
40
20
20
20
7
105
Iyer - Lecture 16
ECE 313 - Fall 2016
Iyer - Lecture 16
ECE 313 - Fall 2016
Erlang and Gamma Distribution
• When r sequential phases have independent identical
exponential distributions, the resulting density (pdf) is known as
r-stage (or r-phase) Erlang:
f (t ) 
•
r t r 1e  t
t  0,   0, r  1,2,...
,
(r  1)!
The CDF (Cumulative distribution function is:
(  t ) k  t
F (t )  1  
e ,
k!
k 0
r 1
• Also:
h(t ) 
Iyer - Lecture 16
r t r 1
( t ) k
(r  1)! 
k!
k 0
r 1
,
t  0,   0, r  1,2,...
t  0,   0, r  1,2,...
ECE 313 - Fall 2016
Erlang and Gamma Distribution (cont.)
• The exponential distribution is a special case of the Erlang
distribution with r = 1.
• A component subjected to an environment so that Nt, the
number of peak stresses in the interval (0, t], is Poisson
distributed with parameter  t.
• The component can withstand (r -1) peak stresses and the rth
occurrence of a peak stress causes a failure.
• The component lifetime X is related to Nt so these two events
are equivalent:
[X  t]  [ Nt  r]
Iyer - Lecture 16
ECE 313 - Fall 2016
Erlang and Gamma Distribution (cont.)
• Thus:
R(t)  P( X  t)
 P(N t  r)
r 1
  P(N t  k)
k0
e
 t
(t) k

k0 k!
r 1
• F(t) = 1 - R(t) yields the previous formula:
(t)k  t
F(t)  1  
e , t  0,   0,r  1,2,...
k 0 k!
• Conclusion: The component lifetime has an r-stage Erlang
distribution.
r1
Iyer - Lecture 16
ECE 313 - Fall 2016
Gamma Function and Density
• If r (call it ) take nonintegral values, then we get the gamma
density:
 t  1e t
f (t) 
,   0,t  0

where the gamma function is defined by the integral:

   x  1e  x dx,
0
0
Iyer - Lecture 16
ECE 313 - Fall 2016
Gamma Function and Density (cont.)
• The following properties of the gamma function useful:
Integration by parts shows that for  > 1:
  ( 1)( 1)
• In particular, if  is a positive integer, denoted by n, then:
n  (n  1)!
• Other useful formulas related to the gamma function are:
( 1 2)  
and

x
0
Iyer - Lecture 16
 1  x
e
dx 


ECE 313 - Fall 2016
Gamma Function and Density (cont.)
Iyer - Lecture 16
ECE 313 - Fall 2016
Hyperexponential Distribution
• A process with sequential phases gives rise to a
hypoexponential or an Erlang distribution, depending upon
whether or not the phases have identical distributions.
• If a process consists of alternate phases, i. e. during any single
experiment the process experiences one and only one of the
many alternate phases, and
• If these phases have independent exponential distributions,
then
• The overall distribution is hyperexponential.
Iyer - Lecture 16
ECE 313 - Fall 2016
Hyperexponential Distribution (cont.)
• The density function of a k-phase hyperexponential random
variable is:
k
f (t)    i  i e
  it
,
i1
k
t  0,  i  0,  i  0,   i  1
i1
• The distribution function is:
F(t)    i (1  e
  it
i
),
t0
• The failure rate is:
 t
  i i e i
h(t) 
,
 it
ie
t>0
which is a decreasing failure rate from ii down to min {1,
2,…}
Iyer - Lecture 16
ECE 313 - Fall 2016
Hyperexponential Distribution (cont.)
• The hyperexponential is a special case of mixture distributions
that often arise in practice:
F(x)    i Fi (x),
i
  1  1, 1  0
• The hyperexponential distribution exhibits more variability than
the exponential, e.g. CPU service-time distribution in a computer
system often expresses this.
• If a product is manufactured in several parallel assembly lines
and the outputs are merged, then the failure density of the
overall product is likely to be hyperexponential.
Iyer - Lecture 16
ECE 313 - Fall 2016
Hyperexponential Distribution (cont.)
Iyer - Lecture 16
ECE 313 - Fall 2016
Example 3 (On-line Fault Detector)
• Consider a model consisting of a functional unit (e.g., an adder)
together with an on-line fault detector
• Let T and C denote the times to failure of the unit and the
detector respectively
• After the unit fails, a finite time D (called the detection latency) is
required to detect the failure.
• Failure of the detector, however, is detected instantaneously.
Iyer - Lecture 16
ECE 313 - Fall 2016
Iyer - Lecture 16
ECE 313 - Fall 2016
Iyer - Lecture 16
ECE 313 - Fall 2016
Iyer - Lecture 16
ECE 313 - Fall 2016
Iyer - Lecture 16
ECE 313 - Fall 2016
Iyer - Lecture 16
ECE 313 - Fall 2016
Example 3 (On-line Fault Detector) cont.
• Let X denote the time to failure indication and Y denote the time
to failure occurrence (of either the detector or the unit).
• Then
X = min{T + D, C}
and
Y = min{T, C}.
• If the detector fails before the unit, then a false alarm is said to
have occurred.
• If the unit fails before the detector, then the unit keeps producing
erroneous output during the detection phase and thus
propagates the effect of the failure.
• The purpose of the detector is to reduce the detection time D.
Iyer - Lecture 16
ECE 313 - Fall 2016
Example 3 (On-line Fault Detector) cont.
• We define:
Real reliability
Rr(t) = P(Y  t) and
Apparent reliability
Ra(t) = P(X  t).
• A powerful detector will tend to narrow the gap between Rr(t)
and Ra(t).
• Assume that T, D, and C are mutually independent and
exponentially distributed with parameters , , and .
Iyer - Lecture 16
ECE 313 - Fall 2016
Example 3 (On-line Fault Detector) cont.
• Then Y is exponentially distributed with parameter  +  and:
Rr (t)  e (   )t
• T + D is hypoexponentially distributed so that:
FT  D (t)  1
Iyer - Lecture 16
  t
  t
e 
e
 
 
ECE 313 - Fall 2016
Example 3 (On-line Fault Detector) cont.
• And, the apparent reliability is:
Ra (t)  P( X  t)
 P(min{T  D,C}  t)
 P(T  D  t and C  t)
 P(T  D  t)P(C  t)
by independence
 [1 FT  D (t)]e t

Iyer - Lecture 16

 
e (   )t 

 
e (   )t
ECE 313 - Fall 2016
Expectation of a Random Variable
• The Discrete Case: If X is a discrete random variable having a
probability mass function p(x), then the expected value of X is
defined by
E[ X ] 
 xp( x)
x: p ( x )  0
The expected value of X is a weighted average of the possible values
that X can take on, each value being weighted by the probability that X
assumes that value. For example, if the probability mass function of X
is given by
then
is just an ordinary average of the two possible values 1 and 2 that X
can assume.
Iyer - Lecture 16
ECE 313 - Fall 2016
Expectation of a Random Variable (Cont.)
Assume
1
2
p (1)  , p (2) 
3
3
Then
is a weighted average of the two possible values 1 and 2 where the
value 2 is given twice as much weight as the value 1 since p(2) =
2p(1).
–Find E[X] where X is the outcome when we roll a fair die.
–Solution: Since
1
p(1)  p(2)  p(3)  p(4)  p(5)  p(6)  , we obtain
6
Iyer - Lecture 16
ECE 313 - Fall 2016
Expectation of a Random Variable (Cont.)
•
Expectation of a Bernoulli Random Variable: Calculate E[X]
when X is a Bernoulli random variable with parameter p.
•
•
Since:
We have:
The expected number of successes in a single trial is just the probability
that the trial will be a success.
Iyer - Lecture 16
ECE 313 - Fall 2016
Expectation of a Random Variable (Cont.)
•
Expectation of a Binomial Random Variable: Calculate E[X]
when X is a binomially distributed with parameters n and p.
.
Iyer - Lecture 16
ECE 313 - Fall 2016
Expectation of a Random Variable (Cont.)
•
•
Expectation of a Geometric Random Variable: Calculate the
expectation of a geometric random variable having parameter p.
We have:


E[ X ]   np (1  p ) n 1
n 1

 p  nq n 1
n 1
where q  1  p
d n
E[ X ]  p  (q )
n 1 dq
d   n
 p q 
dq  n 1 
p
d  q 


dq  1  q 
p
(1  q) 2
1

p

The expected number of independent trials
we need to perform until we get our first success
equals the reciprocal of the probability that any one
trial results in a success.
Iyer - Lecture 16
ECE 313 - Fall 2016
Expectation of a Random Variable (Cont.)
• Expectation of a Poisson Random Variable: Calculate E[X] if X is
a Poisson random variable with parameter λ.
Use the identity:


k


/
k
!

e
k 0
Iyer - Lecture 16
ECE 313 - Fall 2016
The Continuous Case
• The expected value of a continuous random variable: If X is a
continuous random variable having a density function f (x), then the
expected value of X is defined by:

E[ X ]   xf ( x)dx

• Example: Expectation of a Uniform Random Variable,
Calculate the expectation of a random variable uniformly
distributed over (α, β)
The expected value of a random variable
Uniformly distributed over the interval (α, β)
is just the midpoint of the interval.
Iyer - Lecture 16
ECE 313 - Fall 2016
The Continuous Case (Cont.)
• Expectation of an Exponential Random Variable: Let X be exponentially
distributed with parameter λ. Calculate E[X].

E[ X ]   xex dx
0
• Integrating by parts
Iyer - Lecture 16
(dv  e x , u  x)
yields:
ECE 313 - Fall 2016
The Continuous Case Cont’d
• Expectation of a Normal Random Variable): X is normally distributed with
parameters μ and σ2:
1
E[ X ] 
2


( x   ) 2 / 2 2
xe

dx
• Writing x as (x-μ) + μ yields
1
E[ X ] 
2



( x   )e
( x   ) 2 / 2 2
1
dx  
2


e

( x   ) 2 / 2 2
dx
• Letting y= x-μ leads to
E[ X ] 
•
1
2



ye  y
2
/ 2 2

dy    f ( x)dx

Where f(x) is the normal density. By symmetry, the first integral must be 0, and
so

E[ X ]    f ( x)dx  

Iyer - Lecture 16
ECE 313 - Fall 2016
Example 1
• Consider the problem of searching for a specific name in a table of names. A simple
method is to scan the table sequentially, starting from one end, until we either find
the name or reach the other end, indicating that the required name is missing from
the table. The following is a C program fragment for sequential search:
Iyer - Lecture 16
ECE 313 - Fall 2016
Example 1 Cont’d
• In order to analyze the time required for sequential search, let X be the
discrete random variable denoting the number of comparisons
“myName≠Table[I]” made. The set of all possible values of X is {1,2,…,n+1},
and X=n+1 for unsuccessful searches.
• More interesting to consider a random variable Y that denotes the number
of comparisons for a successful search. The set of all possible values of Y
is {1,2,…,n}. To compute the average search time for a successful search,
we must specify the pmf of Y. In the absence of any specific information, let
us assume that Y is uniform over its range:
pY (i ) 
• Then
n
1
, 1  i  n.
n
E[Y ]   i p Y (i ) 
i 1
1 n(n  1) (n  1)

.
n
2
2
• Thus, on the average, approximately half the table needs to be searched
Iyer - Lecture 16
ECE 313 - Fall 2016
Example 2
• If αi denotes the access probability for name Table[i], then the average
successful search time is E[Y] is minimized when names in the table are in
the order of nonincreasing access probabilities; that is, α1 ≥ α2 ≥ … ≥ αn.
c
i
1  , 1  i  n,
• Where the constant c is determined from the normalization requirement
• Thus, c 
1
n
1

i 1 i


n
i 1
1  1
1
1

,
H n ln( n)  C

n
• Where Hn is the partial sum of a harmonic series; that is: H n  i 1 (1 / i) and
C(=0.577) is the Euler Constant.
• Now, if the names in the table are ordered as above, then the average
n
1 n
n
n
search time is
E[Y ]   i i 
i 1
Hn
1  H
i 1

n
ln( n)  C
• Which is considerably less than the previous value (n+1)/2, for large n
Iyer - Lecture 16
ECE 313 - Fall 2016
Example 3
• Zipf’s law has been used to model the distribution of Web page requests
[BRES 1999]. It has been found that pY (i) the probability of a request for
the ith most popular page is inversely proportional to i [ALME1996, WILL
1996],
c
pY (i )  ,1  i  n,
i
• Where n is the total number of Web pages in the universe.
• We assume the Web page requests are independent and the cache can
hold only m Web pages regardless of the size of each Web page. If we
adopt a removal policy called “least frequently used”, which always keeps
the m most popular pages, then the hit ratio h(m)- the probability that a
request can find its page in cache- is given by
m
h(m)   pY (i)  cH m 
i 1
Iyer - Lecture 16
H m ln( m)  C

H n ln( n)  C
ECE 313 - Fall 2016
Moments
• Let X be a random variable, and define another random variable Y as a
function of X so that Y   ( X ). Suppose that we wish to compute E[Y]
   ( xi ) p X ( xi ),

E[Y ]  E[ ( X )]   i
  ( x) f X ( x)dx,
if X is discrete,
if X is continuous,
(provided the sum or the integral on the right-hand side is absolutely
k
convergent). A special case of interest is the power function  ( X )  X
k
For k=1,2,3,…, E[ X ] is known as the kth moment of the random variable X.
Note that the first moment E[ X ] is the ordinary expectation or the mean of X.
• We define the kth central moment,  k of the random variable X by
mk = E[(X - E[X])k ]
• Known as the variance of X, Var[X], often denoted by 
• The variance of a random variable X is
ì
2
(x
E[X])
p(xi )
if X is discrete
å
i
ï
i
2
Var[X] = s = í ¥
ï ò (x - E[X])2 f (x)dx if X is continuous
î -¥
• It is clear that Var[X] is always a nonnegative number.
Iyer - Lecture 16
2
ECE 313 - Fall 2016
Variance: 2nd Central Moment
• We define the kth central moment,  k of the random variable X by
 k  E[( X  E[ X ]) k ]
2
2
2
•  known as the variance of X, Var[X], often denoted by   E[( X  E[ X ]) ]
• Definition (Variance). The variance of a random variable X is
ì
ï
2
Var[X] = s = í
ï
î
å (x - E[X]) p(x )
ò (x - E[X]) f (x)dx
2
i
¥
-¥
i
i
2
if X is discrete
if X is continuous
• It is clear that Var[X] is always a nonnegative number.
Iyer - Lecture 16
ECE 313 - Fall 2016
Functions of a Random Variable
•
•
Let Y   ( X )  X 2 As an example, X could denote the measurement
error in a certain physical experiment and Y would then be the square
of the error (e.g. method of least squares).
Note that
FY ( y )  0 for y  0. For y  0,
FY ( y )  P(Y  y )
 P( X 2  y )
 P( y  X  y )
 FX ( y )  FX ( y ),
and by differenti ation the density of Y is
 1
[ f X ( y )], y  0,

fY ( y )   2 y
0,
otherwise.
Iyer - Lecture 16
ECE 313 - Fall 2016
Functions of a Random Variable (cont.)
•
Let X have the standard normal distribution [N(0,1)] so that
1  x2 / 2
f X ( x) 
e
,
2
Then
   x  .
 1  1 y
e


fY ( y )   2 y  2
 0,

/2

1 y
e
2
/2

 y  0,
, y  0,
or
 1
e y

fY ( y )   2y
 0,

•
/2
y  0,
,
y  0.
This is a chi-squared distribution with one degree of freedom
Iyer - Lecture 16
ECE 313 - Fall 2016
Functions of a Random Variable (cont.)
•
•
Let X be uniformly distributed on (0,1). We show that Y  1 ln( 1  X )
has an exponential distribution with parameter   0. Observe that Y is
a nonnegative random variable implying FY (y) = 0for y £ 0.
For y > 0, we have
FY (y) = P(Y £ y) = P[-l -1 ln(1- X) £ y]
= P[ln(1- X) ³ -l y]
= P[(1- X) ³ e- l y ] (since e x is an increasing function of x,)
= P(X £ 1- e- l y )
= FX (1- e- l y ).
But since X is uniform over (0,1), FX (x) = x, 0 £ x £ 1.
Thus FY (y) = 1- e- l y . Therefore Y is exponentially distributed with parameter l.
•
This fact can be used in a distribution-driven simulation. In simulation programs it is
important to be able to generate values of variables with known distribution functions.
Such values are known as random deviates or random variates. Most computer
systems provide built-in functions to generate random deviates from the uniform
distribution over (0,1), say u. Such random deviates are called random numbers.
Iyer - Lecture 16
ECE 313 - Fall 2016
Example 1
•
Let X be uniformly distributed on (0,1). We obtain the cumulative
distribution function (CDF) of the random variable Y, defined by Y = Xn
as follows: for 0  y  1,
FY ( y )  P{Y  y}
 P{ X n  y}
 P{ X  y1/ n }
 FX ( y1/ n )
 y1/ n
•
Now, the probability density function (PDF) of Y is given by
ì 1 1 -1
ï y n
fY (y) = í n
ï
0
î
Iyer - Lecture 16
0 £ y £1
otherwise
ECE 313 - Fall 2016
Expectation of a Function of a Random
Variable
•
•
•
•
•
•
Given a random variable X and its probability distribution or its pmf/pdf
We are interested in calculating not the expected value of X, but the
expected value of some function of X, say, g(X).
One way: since g(X) is itself a random variable, it must have a
probability distribution, which should be computable from a
knowledge of the distribution of X. Once we have obtained the
distribution of g(X), we can then compute E[g(X)] by the definition of
the expectation.
Example 1: Suppose X has the following probability mass function:
p(0)  0.2,
p(1)  0.5,
p(2)  0.3
Calculate E[X2].
Letting Y=X2,we have that Y is a random variable that can take on one of
the values, 02, 12, 22 with respective probabilities
Hence,
2
pY (0) = P{Y = 0 } = 0.2
2
E
[
X
]  E[Y ]  0(0.2)  1(0.5)  4(0.3)  1.7
2
pY (1) = P{Y = 1 } = 0.5
Note that
2
pY (2) = P{Y = 2 } = 0.3
1.7  E[ X 2 ]  E[ X ]2  1.21
Iyer - Lecture 16
ECE 313 - Fall 2016
Expectation of a Function of a Random
Variable (cont.)
•
Proposition 2: (a) If X is a discrete random variable with probability mass
function p(x), then for any real-valued function g,
E[ g ( X )]   g ( x) p ( x)
x: p ( x )  0
•
(b) if X is a continuous random variable with probability density function f(x),
then for any real-valued function g:

E[ g ( X )]   g ( x) f ( x)dx

•
Example 3, Applying the proposition to Example 1 yields
•
Example 4, Applying the proposition to Example 2 yields
E[ X 2 ]  02 (0.2)  (12 )(0.5)  (22 )(0.3)  1.7
1
E[ X ]   x 3dx
3
0
(since f(x)  1, 0  x  1)
1

4
Iyer - Lecture 16
ECE 313 - Fall 2016
Corollary
• If a and b are constants, then E[aX  b]  aE[ X ]  b
• The discrete case:
 (ax  b) p( x)
E[aX  b] 
x: p ( x )  0
 xp( x)  b  p( x)
a
x: p ( x )  0
x: p ( x )  0
 aE[ X ]  b
• The continuous case:

E[aX  b]   (ax  b) f ( x)dx





 a  xf ( x)dx  b  f ( x)dx
 aE[ X ]  b
Iyer - Lecture 16
ECE 313 - Fall 2016
Moments
• The expected value of a random variable X, E[X], is also
referred to as the mean or the first moment of X.
• The quantity E[ X n ], n  1 is called the nth moment of X. We
have:
ì
ï
ï
n
E[X ] = í
ï
ï
î
å
x n p(x), if X is discrete
x:p( x )>0
¥
òx
n
f (x)dx,
if X is continuous
-¥
• Another quantity of interest is the variance of a random variable
X, denoted by Var(X), which is defined by:
Var( X )  E[( X  E[ X ]) 2 ]
Iyer - Lecture 16
ECE 313 - Fall 2016