Download X = X1 + . . . + Xn.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 4
DeGroot & Schervish
Variance
 Although the mean of a distribution is a useful summary, it
does not convey very much information about the
distribution.
 A random variable X with mean 2 has the same mean as the
constant random variable Y such that Pr(Y = 2) = 1
 even if X is not constant!
 To distinguish the distribution of X from the distribution of
Y in this case, it might be useful to give some measure of
how spread out the distribution of X is.
 The variance of X is one such measure.
 The standard deviation of X is the square root of the
variance.
Stock Price Changes
 Consider the prices A and B of two stocks at a time one
month in the future.
 Assume that A has the uniform distribution on the
interval [25, 35] and B has the uniform distribution on
the interval [15, 45].
 Both stocks have a mean price of 30. But the
distributions are very different.
Stock Price Changes
Variance/Standard Deviation
 Let X be a random variable with finite mean μ = E(X).
 The variance of X, denoted by Var(X), is defined as
follows:
 The standard deviation of X is the nonnegative square
root of Var(X) if the variance exists.
 When only one random variable is being discussed, it
is common to denote its standard deviation by the
symbol σ, and the variance is denoted by σ2.
Stock Price Changes
 Return to the two random variables A and B in the
example
Variance and Standard Deviation of a
Discrete Distribution
 Suppose that a random variable X can take each of the five
values −2, 0, 1, 3, and 4 with equal probability.
 E(X) = 1/5(−2 + 0 + 1+ 3 + 4) = 1.2.
 W = (X − μ)2 , Var(X) = E(W).
Properties of the Variance
 Theorem:
 Var(X) = 0 if and only if there exists a constant c such
that Pr(X = c) = 1.
Properties of the Variance
 Theorem:
 For constants a and b,
 Y = aX + b,
 Var(Y ) = a2 Var(X),and σY = |a|σX.
Calculating the Variance and Standard
Deviation of a Linear Function
 Suppose that a random variable X can take each of the
five values −2, 0, 1, 3, and 4 with equal probability.
 Determine the variance and standard deviation of
Y = 4X − 7.
 The mean of X is μ = 1.2 and the variance is 4.56
 Var(Y ) = 16 Var(X) = 72.96.
 Also, the standard deviation σ of Y is
 σY = 4σX = 4(4.56)1/2 = 8.54.
 For every random variable X, Var(X) = E(X2) − [E(X)]2.
Theorem
 If X1, . . . , Xn are independent random variables with finite means,
then
 Var(X1 + . . . + Xn) = Var(X1) + . . . + Var(Xn).
The Variance of a Binomial Distribution
 Suppose that a box contains red balls and blue balls,
and that the proportion of red balls is p (0 ≤ p ≤ 1).
 Suppose n balls is selected from the box with
replacement.
 For i = 1, . . . , n, let Xi = 1 if the ith ball that is selected
is red, and let Xi = 0 otherwise.
 If X denotes the total number of red balls in the
sample, then
 X = X1 + . . . + Xn and X will have the binomial
distribution with parameters n and p.
 Since X1, . . . , Xn are independent, it follows from the
theorem
 E(Xi) = p for i = 1, . . . , n. Since Xi2 = Xi for each i,
E(Xi2 ) = E(Xi) = p.
 Var(Xi) = E(Xi2 ) − [E(Xi)]2 = p − p2 = p(1− p).
 Var(X) = np(1− p).
Moments
 For a random variable X, the means of powers Xk
(called moments) for k >2 have useful theoretical
properties, and some of them are used for additional
summaries of a distribution.
 The moment generating function is a related tool
Existence of Moments
 For each random variable X and every positive integer
k, the expectation E(Xk) is called the kth moment of X
 In particular, in accordance with this terminology, the
mean of X is the first moment of X.
Existence of Moments
 Suppose that X is a random variable for which E(X)=μ.
 For every positive integer k, the expectation E[(X −μ)k]
is called the kth central moment of X or the kth
moment of X about the mean.
 In particular, in accordance with this terminology, the
variance of X is the second central moment of X.
Moment Generating Functions
 Let X be a random variable. For each real number t ,
 ψ(t) = E(etX).
 The function ψ(t) is called the moment generating
function (abbreviated m.g.f.) of X.
 The Moment Generating Function of X Depends Only
on the Distribution of X:
 Since the m.g.f. is the expected value of a function of X,
it must depend only on the distribution of X.
 If X and Y have the same distribution, they must have
the same m.g.f.
Theorem
 LetX be a random variables whose m.g.f. ψ(t) is finite
for all values of t in some open interval around the
point t = 0.
 Then, for each integer n > 0, the nth moment of X,
E(Xn), is finite and equals the nth derivative ψ(n)(t)
at t = 0. That is, E(Xn) = ψ(n)(0) for n = 1, 2, . . . .
Example
Example
Properties of Moment Generating
Functions
 Theorem
 Let X be a random variable for which the m.g.f. is ψ1; let
Y = aX + b, where a and b are given constants; and let ψ2
denote the m.g.f. of Y . Then for every value of t such
that ψ1(at) is finite, ψ2(t) = ebtψ1(at).
Example
Theorem
 Suppose that X1, . . . , Xn are n independent random
variables; and for i = 1, . . . , n, let ψi denote the m.g.f.
of Xi .
 Let Y = X1+ . . . + Xn, and let the m.g.f. of Y be denoted
by ψ. Then for every value of t such that ψi(t) is finite
for i = 1, . . . , n,
Proof
The Moment Generating Function
for the Binomial Distribution
 Suppose that a random variable X has the binomial
distribution with parameters n and p.
 The mean and the variance of X are determined by
representing X as the sum of n independent random
variables X1, . . . , Xn.
 The distribution of each variable Xi is as follows:
 Pr(Xi = 1) = p and Pr(Xi = 0) = 1− p.
 Now use this representation to determine the m.g.f. of
X = X1 + . . . + Xn.
The Moment Generating Function
for the Binomial Distribution
Uniqueness of Moment Generating
Functions
 Theorem
 If the m.g.f.’s of two random variables X1 and X2 are
finite and identical for all values of t in an open
interval around the point t = 0, then the probability
distributions of X1 and X2 must be identical.
The Additive Property of the
Binomial Distribution
 If X1 and X2 are independent random variables, and if
Xi has the binomial distribution with parameters ni
and p (i = 1, 2), then X1 + X2 has the binomial
distribution with parameters n1 + n2 and p.
The Mean and the Median
 Although the mean of a distribution is a measure of
central location, the median is also a measure of
central location for a distribution.
 Let X be a random variable.
 Every number m with the following property is called a
median of the distribution of X:
Pr(X ≤ m) ≥ 1/2 and Pr(X ≥ m) ≥ 1/2.
 Indeed, the 1/2 quantile is a median.
Example
 The Median of a Discrete Distribution:
 Suppose that X has the following discrete distribution:
 Pr(X = 1) = 0.1, Pr(X = 2) = 0.2,
 Pr(X = 3) = 0.3, Pr(X = 4) = 0.4.
 The value 3 is a median of this distribution because
Pr(X ≤ 3) = 0.6, which is greater than 1/2, and Pr(X ≥ 3)
= 0.7, which is also greater than 1/2.
 Furthermore, 3 is the unique median of this
distribution.
Example
 A Discrete Distribution for Which the Median Is Not
Unique:
 Suppose that X has the following discrete distribution:
 Pr(X = 1) = 0.1, Pr(X = 2) = 0.4,
 Pr(X = 3) = 0.3, Pr(X = 4) = 0.2.
 Pr(X ≤ 2) = 1/2, and Pr(X ≥ 3) = 1/2. Therefore, every
value of m in the closed interval 2 ≤ m ≤ 3 will be a
median of this distribution.
 The most popular choice of median of this distribution
would be the midpoint 2.5.
Example
 The Median of a Continuous Distribution.
 Suppose that X has a continuous distribution for which
the p.d.f. is as follows:
Mean Squared Error/M.S.E
 Suppose that X is a random variable with mean μ and




variance σ2.
Suppose also that the value of X is to be observed in some
experiment, but this value must be predicted before the
observation can be made.
One basis for making the prediction is to select some
number d for which the expected value of the square of the
error X − d will be a minimum.
The number E[(X − d)2] is called the mean squared error
(M.S.E.) of the prediction d.
The number d for which the M.S.E. is minimized is E(X).
Mean Absolute Error/M.A.E.
 Another possible basis for predicting the value of a
random variable X is to choose some number d for
which E(|X − d|) will be a minimum.
 The M.A.E. is minimized when the chosen value of d is
a median of the distribution of X.
Predicting a Discrete Uniform Random
Variable.
 Suppose that the probability is 1/6 that a random variable X will take





each of the following six values: 1, 2, 3, 4, 5, 6.
Determine the prediction for which the M.S.E. is minimum and the
prediction for which the M.A.E. is minimum.
In this example, E(X) = 1/6(1+ 2 + 3 + 4 + 5 + 6) = 3.5.
Therefore, the M.S.E. will be minimized by the unique value d = 3.5.
Also, every number m in the closed interval 3 ≤ m ≤ 4 is a median of
the given distribution. Therefore, the M.A.E. will be minimized by
every value of d such that 3 ≤ d ≤ 4.
Because the distribution of X is symmetric, the mean of X is also a
median of X.
Covariance and Correlation
 When we are interested in the joint distribution of two
random variables, it is useful to have a summary of
how much the two random variables depend on each
other.
 The covariance and correlation are attempts to
measure that dependence, but they only capture a
particular type of dependence, namely linear
dependence.
Covariance
 Let X and Y be random variables having finite means.
Let E(X) = μX and E(Y) = μY .
 The covariance of X and Y, which is denoted by
Cov(X,Y), is defined as
 Cov(X, Y ) = E[(X − μX)(Y − μY )]
Example
 Let X and Y have the joint p.d.f. f:
Theorem
 For all random variables X and Y
 Cov(X, Y ) = E(XY) − E(X)E(Y).
 Proof
Cov(X, Y ) = E(XY − μXY − μYX + μXμY )
= E(XY) − μXE(Y) − μYE(X) + μXμY .
Correlation
 Let X and Y be random variables with finite variances
σX2 and σY2 , respectively.
 Then the correlation of X and Y , which is denoted by
ρ(X, Y), is defined as follows:
Theorem
Properties of Covariance and
Correlation
 If X and Y are independent random variables
 Cov(X, Y ) = ρ(X, Y) = 0.
 Proof If X and Y are independent, then
 E(XY) = E(X)E(Y).
 Cov(X, Y ) = 0.
 Also, it follows that ρ(X, Y) = 0.
Theorem
 Suppose that X is a random variable and Y = aX + b.
 If a>0, then ρ(X, Y) = 1.
 If a <0, then ρ(X, Y)=−1.
 Since σY= |a|σX, the theorem follows from Correlation
equation.
Theorem
 If X and Y are random variables
 Var(X + Y) = Var(X) + Var(Y ) + 2 Cov(X, Y ).
Theorem
Related documents