Download Random variables: variance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Random variables: variance
March 15, 2009
1
Definition
We defined expectation, a plug-in for average in the last handout. The expectation of X is analogous to
the center of mass—in mechanics, we model objects to be point masses located at its center of mass. In
probability, we hope that the expectation somehow captures the behavior of the random variable—though
as of now, we haven’t been able to say exactly how.
But the center of mass does not capture all behavior of the object—it cannot help you figure out how
the object will rotate. To model rotation, there is a different concept—the moment of inertia. Similarly in
probability, expectation cannot tell you how the variable is spread around the expectation. For example, a
Binomial(1000,.1), Geometric(.01) and a constant distribution that assigns probability 1 to 100 (0 else) all
have expectation 100—but the constant distribution gives non-zero probability to only 1 value, the binomial
random variable can be one of 1001 values, and the geometric random variable can take on infinite values.
We introduce the notion of variance to specify how the expectation captures the behavior of the random
variable, and to describe the spread of the random variable. Specifically, we refer to (??). Recall that for
notational convenience, we denote the set of all values a random variable X can take by X(Ω), and if we
consider a real valued function g of X,
X
Eg(X) =
g(x)P (X = x).
x∈X(Ω)
The variance of X is the expectation of g(x) = (x − EX)2 . Therefore,
X
var(X) = Eg(X) = E(X − EX)2 =
(x − EX)2 P (X = x).
x∈X(Ω)
HW 1 Show that for all real values of c, var(X + c) = var(X). Hint: first show that E(X + c) = EX + c.
HW 2 Show that var(X) = EX 2 − (EX)2 .
X will always be written as (EX)2 .
2
Notation: EX 2 is expectation of X 2 . Square of expectation of
Bernoulli and Binomial random variables
If X is a Bernoulli p random variable, then EX 2 = p, therefore
var(X) = EX 2 − (EX)2 = p − p2 = p(1 − p).
(1)
Suppose we have n independent Bernoulli p trials, the i0 th trial being Xi . As we saw before, the random
variable
Y = X1 + X2 + . . . + Xn
1
is a Binomial random variable with parameters n and p, and EY = np. To compute the variance of Y , we
use the identity
n
X
X
(X1 + X2 + . . . + Xn )2 =
Xi2 + 2
Xi Xj ,
i<j
i=1
where
X
Xi Xj =
i<j
n
n
X
X
Xi Xj .
i=1 j=i+1
Note that
EXi2 = 12 · p = p
while
EXi Xj = P (Xi = 1, Xj = 1) = p2 .
Therefore,
EY 2 = E(X1 + X2 + . . . + Xn )2 = E
n
X

Xi2 + E 2
i=1

X
Xi Xj  =
i<j
n
X
EXi2 + 2
i=1
n 2
EXi Xj . = np + 2
p .
2
i<j
X
It follows that
var(Y ) = EY 2 − (EY )2 = np − np2 = np(1 − p).
(2)
Observe from (2) and (1) that the variance of Y is the sum of variances of Xi in this particular case.
HW 3 Since
(X1 + X2 + . . . + Xn )2 = X1 (X1 + X2 + . . . + Xn ) + X2 (X1 + X2 + . . . + Xn ) + . . . + Xn (X1 + X2 + . . . + Xn ),
show that
(X1 + X2 + . . . + Xn )2 =
n X
n
X
Xi Xj .
i=1 j=1
Therefore, that
(X1 + X2 + . . . + Xn )2 =
n
X
Xi2 +
i=1
n X
X
Xi Xj .
i=1 j6=i
Why is
n X
X
Xi Xj = 2
i=1 j6=i
X
Xi Xj ?
i<j
HW 4 You are now ready for a real-life scenario. If there is only one thing you remember from your 4 years
of engineering undergrad, I think you should remember this principle: with most real problems, there is
lot of value in modeling the situation appropriately, even if the model is not completely accurate. We will
encounter such a situation here.
Suppose you want to estimate the proportion of defective chips produced at a particular chip fabrication
unit. You know N is the total number of chips produced at this fabrication unit. You do not have the
resources to test all N chips. So you pick n N chips without replacement. Each choice of chip is random
among what chips are remaining at that point and is independent of the test results on previous chips.
You test these n chips. From these test results you have to figure out what proportion of the N chips
are defective. Let m be the (unknown) number of defective chips—so you want to estimate m/N .
Suppose Xi denotes if your i0 th chip is defective—if the i0 th chip is defective, Xi = 1, else Xi = 0. Since
the first chip is picked at random among the N chips, the P (X1 = 1) = m/N . The second chip is now
chosen randomly among the remaining N − 1 chips.
2
1. If X1 = 1, what is the probability X2 = 1? In other words, what is P (X2 = 1|X1 = 1)?
2. If X1 = 0, what is the probability X2 = 1? Namely, what is is P (X2 = 1|X1 = 0)?
3. What is P (X2 = 1)?
4. Are X1 and X2 independent?
If N is very large, note that P (X2 = 1|X1 = 0) ≈ P (X2 = 1|X1 = 1) ≈ P (X2 = 1). Furthermore, convince
yourself that if n N , then it is a good approximation to consider X1 , . . . ,Xn to be independent Bernoulli
m/N variables.
Optional: how can you prove this is a good approximation? The starting point is to prove that if n is
sufficiently smaller than m, with high confidence, even if you chose n chips with replacement, you would not
see repeats (as if you sampled without replacement)—how would you continue? See also the optional question
at the end of Subproblem 10.
Your estimate p̂ of m/N is
Pn
Xi
p̂ = i=1 ,
n
namely, you estimate what proportion of the n chips you sampled are defective. You need to estimate m/N
to an accuracy of ±.03 and you want to be 90% confident of your answer. The next 5 parts help you do this:
5. What is E p̂?
6. Show that the variance of your estimate p̂ is
m(N −m)
.
nN 2
7. For all 0 ≤ x ≤ 1, 0 ≤ x(1 − x) ≤ 41 . Therefore show that
var(p̂) ≤
1
.
4n
8. Therefore, the standard deviation of your estimate is at most 2√1 n . Recall that you need to estimate
m/N to an accuracy of ±.03 and you want to be 90% confident of your answer. What should be n,
the number of chips that you must test? Hint: Use (??) to first find k from the confidence probability
.9. Then use the accuracy value .03 and k to find n.
9. You will need to verify at this point that m n. Since you know N and you have the estimate p̂,
what is your estimate of m? You are definitely in trouble if the estimate of m so obtained is not much
greater than n. But suppose the estimate of m is sufficiently n.
10. Are you done? Can anything else go wrong?
(Optional) The confidence interval obtained doesn’t take into account we approximated independence. How
would you take that into account? Suppose we choose the chips with replacement and let Xi = 1 denote
the event that the i0 th choice is defective. Are all Xi independent now? If yes, how would you proceed? If
not, what (precise) conditions on N, n, m make them approximately independent, and how would you derive
confidence intervals?
The two optional parts can bump you up by up to a grade, and they are not as hard as the optional
problem in Handout 7.
3
Correlation
When computing the variance of a Binomial variable,
Y = X1 + . . . + X + n,
we observed that the variance of Y was the sum of variances of Xi since Xi were independent. Here we will
examine a general necessary condition for the variances add up.
3
The correlation between two random variables X1 and X2 is defined as
corr(X1 , X2 ) = EX1 X2 − (EX1 )(EX2 ).
(3)
We are using a definition here that is different from the usual definition in statistics and probability. The
quantity in (3) is called covariance in these fields. They define a quantity ρ between X1 and X2 as
ρ(X1 , X1 ) =
EX1 X2 − (EX1 )(EX2 )
p
.
var(X1 )var(X2 )
(4)
We will call ρ the correlation coefficient—note that the definition above only differs from (3) by a normalization constant. Because of the normalization, it can be shown that −1 ≤ ρ ≤ 1, while correlation as we
defined it can take any real value.
The reason we use the definition (3) is to coincide with the notion of autocorrelation (defined as the
correlation between samples of a signal at two different instants) used in signal processing and communication. Autocorrelation determines how much power a signal has in any given frequency—so an estimate of
autocorrelation is a very important design parameter for a communication engineer.
In this course, we will always use (3) to mean correlation, and ρ to mean correlation coefficient (instead
of covariance for (3) and correlation for (4) as is common in the statistics and math literature). But always
be clear always what is meant (with or without the normalization).
If the correlation between X1 and X2 is 0, then the variables are said to be uncorrelated. When we have
multiple random variables, X1 , . . . ,Xn , if we say the variables are uncorrelated, we mean that for all i 6= j,
Xi and Xj are uncorrelated.
In this section X1 , . . . ,Xn are not necessarily independent, and
Y = X1 + X2 + . . . + Xn .
Observe that
EY 2 = E(X1 + X2 + . . . + Xn )2 = E
n
X

Xi2 + E 

X
i=1
Xi Xj  =
n
X
EXi2 +
i=1
i6=j
X
EXi Xj
i6=j
If for all i and j, EXi Xj = EXi EXj , then
EY 2 =
n
X
EXi2 +
i=1
while
(EY )2 = (E
n
X
X
EXi Xj =
n
X
EXi2 +
i=1
i6=j
X
EXi EXj ,
(5)
i6=j
n
n
X
X
X
Xi )2 = (
EXi )2 =
(EXi )2 +
EXi EXj .
i=1
i=1
i=1
i6=j
n
X
X
Therefore from (5) and (6),
var(Y ) = EY 2 − (EY )2
=
n
X
i=1
=
n
X

EXi2 +
X
EXi EXj − 

(EXi )2 +
i=1
i6=j
EXi EXj 
i6=j
(EXi2 − (EXi )2 )
i=1
=
n
X
var(Xi ).
i=1
Therefore, if Xi and Xj are uncorrelated for all pairs i and j, where i 6= j, then
var(Y ) =
n
X
i=1
4
var(Xi ).
(6)
4
Correlation and Independence
Independence, or even pairwise independence between variables implies that the variables are uncorrelated.
The reverse is however, not true—uncorrelated variables need not be independent or even pairwise independent.
To see that independent or pairwise independent variables X1 , . . . ,Xn are also uncorrelated, we simplify
EXi Xj (i 6= j) as follows
X
X
EXi Xj =
xi xj P (Xi = xi , Xj = xj )
xi ∈Xi (Ω) xj ∈Xj (Ω)
X
=
X
xi xj P (Xi = xi )P (Xj = xj |Xi = xi )
xi ∈Xi (Ω) xj ∈Xj (Ω)
(a)
=
X
X
xi xj P (Xi = xi )P (Xj = xj )
xi ∈Xi (Ω) xj ∈Xj (Ω)
=
X
xi P (Xi = xi )
xi ∈Xi (Ω)
X
xj P (Xj = xj )
xj ∈Xj (Ω)
= EXi EXj ,
where the equality (a) holds when Xi and Xj are independent. Therefore, if for all i and j (i 6= j), Xi and
Xj are independent, namely X1 , . . . ,Xn are pairwise independent then the variables are also uncorrelated. If
the stronger condition of independence holds, then the variables are also pairwise independent and therefore
also uncorrelated.
The following probability distribution illustrates a case where the variables X1 and X2 are uncorrelated,
but they are not independent.
X1
-1
-1
-1
0
0
0
1
1
1
X2
-1
0
1
-1
0
1
-1
0
1
P (X1 , X2 )
1/12
1/6
1/12
1/9
1/9
1/9
1/15
1/5
1/15
Furthermore since there are only two variables, they are not pairwise independent either. Now P (X1 =
−1) = P (X1 = 0) = P (X1 = 1) = 1/3. Observe that X1 and X2 are not independent since
1/15
1
P (X1 = 1, X2 = 1)
=
=
P (X1 = 1)
1/3
5
P (X1 = 0, X2 = 1)
1/9
1
=
= .
P (X2 = 1|X1 = 0) =
P (X1 = 0)
1/3
3
P (X2 = 1|X1 = 1) =
On the other hand, EX1 X2 = 0 and EX1 = 0, which implies that
corr(X1 , X2 ) = EX1 X2 − EX1 EX2 = 0,
namely X1 and X2 are uncorrelated.
HW 5 This is a computer assignment. If you are unfamiliar with MATLAB, please come and see me.
1. How would you generate X1 , a Bernoulli variable with probability p = 1/4?
5
2. Generate 100 independent Bernoulli 1/4 variables in MATLAB, X1 , . . . ,X100 . That is, each variable
Xi must be 1 with probability 1/4 no matter what the values of the other random variables are.
Suppose you have 100 Bernoulli 1/4 trials. Define Yk to be the sum of the first k trials and Yn−k+1 to
be the sum of the last k trials. What is the expectation of Yk and of Yn−k+1 ? The variance of Yk and of
Yn−k+1 ?
Now generate 50 binary 100-length sequences X(i) = X1 , . . . ,X100 as follows. Each sequence X(i) is the
100-bit sequence you get as a result of 100 independent Bernoulli 1/4 trials. So you do a total of 50×100
independent Bernoulli 1/4 trials in all.
(i)
(i)
For the i0 th trial, let Yk be the sum of the first k trials, while Yn−k+1 be the sum of the last k variables.
Fix k = 75.
(i)
1. Plot the values of Yk
(i)
against Yn−k+1 .
(i)
2. Use MATLAB to find the best (minimum least square error) linear fit between Yk
is the slope of this line?
(i)
3. Find the sample expectation of Yk
(i)
and Yn−k+1 , namely
P50
Yk =
(i)
the sample variance of Yk
(i)
and Yn−k+1 ? What
(i)
Yk
50
i=1
(i)
and Yn−k+1 ,
(i)
i=1 (Yk
P50
50
− Y K )2
,
as well as the sample correlation coefficient (define it yourself). Note that the sample mean, the sample
variance and sample correlation coefficient are not necessarily the true values of mean, variance and
correlation coefficients—rather our approximations the observations.
4. Why do we compute sample expectation like above? Note that the random variables do not have
uniform distributions—does it matter?
5. How do you think the slope of √
the linear fit above, the correlation coefficient and the variances are
var(Yk )
related? Try ρ(Yk , Yn−k+1 ) × √
if Yk is on the y−axis.
var(Yn−k+1 )
Instead of 50 sequences X(i) , generate 100 sequences. Then 1000 sequences. What happens as you increase
the number of sequences?
6