Download Random variables, mean and variance: Suppose in a collection of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Random variables, mean and variance:
Suppose in a collection of people there are some number with height 6’, and equal
numbers with heights 5’11” and 6’1”. The mean or average of this distribution is 6’, as
can be determined by summing the heights of all the people and dividing by the number
of people, or equivalently by summing over distinct heights weighted by the fractional
number of people with that height. Suppose for example, that the numbers in the above
height categories are 5,30,5, then the latter calculation corresponds to (1/8) · 5’11” + (3/4) ·
6’ + (1/8) · 6’1” = 6’. But the average gives only limited information about a distribution.
Suppose there were instead only people with heights 5’ and 7’, and an equal number of each,
then the average would still be 6’ though these are very different distributions. It is useful
to characterize the variation within the distribution from the mean. The average deviation
from the mean gives zero due to equal positive and negative variations (as proven below), so
the quantity known as the variance (or mean square deviation) is defined as the average of
the squares of the differences between the values in the distribution and their mean. For the
first distribution above, this gives the variance V = 81 (−1”)2 + 34 (0”)2 + 18 (1”)2 = 14 (inch)2 ,
and for the second distribution the much larger result V = 21 (−1’)2 + 12 (1’)2 = 1(foot)2 .
The standard or r.m.s (“root mean square”) deviation σ is defined as the square root of
√
the variance, σ = V . The above two distributions have σ = (1/2 inch) and σ = (1 foot)
respectively.
30
mean 72.0
stdev 0.5
mean 72.0
stdev 12.0
#people with that height
25
aheights = [6*12+1]*5 + [6*12]*30 + [5*12+11]*5
bheights = [5*12]*20 + [7*12]*20
figure(figsize=(5,5))
hist(aheights,bins=arange(59.5,90))
20
hist(bheights,bins=arange(59.5,90))
xlabel(’inches’)
ylabel(’#people with that height’)
legend([’mean {}\n stdev {}’.format(mean(d),std(d))
for d in (aheights,bheights)])
savefig(’hhist.pdf’)
15
10
5
0
55
60
65
70
75
inches
80
85
90
1
INFO 2950, 18 Feb 16
More generally, a random variable is a function X : S → IR, assigning some real number to each element of the probability space S. The average of this variable is determined
by summing the values it can take weighted by the corresponding probability,
<X> =
X
p(s)X(s) .
s∈S
(An alternate notation for this is E[X] = <X>, for the “expectation value” of X.)
Example 1: roll two dice and let X be the sum of two numbers rolled. Thus
X({1, 1}) = 2, X({1, 2}) = X({2, 1}) = 3, . . ., X({6, 6}) = 12. The average of X is
<X> =
1
2
3
4
5
6
5
4
3
2
1
2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 = 7 .
36
36
36
36
36
36
36
36
36
36
36
Example 2: flip a coin 3 times, and let X be the number of tails. The average is
<X> =
1
3
3
1
3
3+ 2+ 1+ 0= .
8
8
8
8
2
The expectation of the sum of two random variables X, Y (defined on the same sample space) satisfies <X + Y > = <X> + <Y >. In general, they satisfy a “linearity of
expectation” <aX + bY > = a<X> + b<Y > proven as follows:
P
P
P
<aX+bY > = s p(s)(aX(s)+bY (s)) = a s p(s)X(s)+b s p(s)Y (s) = a<X>+b<Y >.
Thus an alternate way to calculate the mean of X = X1 + X2 for the two dice rolls in
example 1 above is to calculate the mean for a single die, X1 = (1 + 2 + 3 + 4 + 5 + 6)/6 =
21/6 = 7/2, and so for two rolls <X> = <X1 > + <X2 > = 7/2 + 7/2 = 7.
By definition, independent random variables X, Y satisfy p(X=a ∧ Y =b) = p(X =
a)p(Y = b) (i.e., the joint probability is the product of their independent probabilities,
just as for independent events). For such variables, it follows that the expectation value
of their product satisfies
<XY > = <X><Y >
(X, Y independent)
P
P
P
P
since r,s p(r, s)X(r)Y (s) = r,s p(r)p(s)X(r)Y (s) =
r p(r)X(r)
s p(s)Y (s) .
To see that the above relation fails when X and Y are not independent, consider a
single coin flip and let X count the number of heads, and Y count the number of tails.
Then <X> = <Y > = 1/2, but <XY > = 0 since one of X or Y is always zero on any
given flip. On the other hand, consider flipping a coin ten times and rolling a die 12 times,
and let X count the number of heads of the coin flip, and Y the number of times a six is
rolled. Then <XY > = <X><Y > = 5 · 2 = 10.
As indicated above, the average of the differences of a random variable from the mean
P
P
vanishes:
p(s)
X(s)
−
<X>
=
<X>
−
<X>
s∈S
s p(s) = <X> − <X> = 0. The
2
INFO 2950, 18 Feb 16
variance of a probability distribution for a random variable is defined as the average of the
squared differences from the mean,
V [X] =
X
p(s) X(s) − <X>
2
.
(V 1)
s∈S
The variance satisfies the important relation
V [X] = <X 2 > − <X>2 ,
(V 2)
following directly from the definition above:
V [X] =
X
=
X
p(s) X(s) − <X>
2
s∈S
X 2 (s)p(s) − 2<X>
s
X
p(s)X(s) + <X>2
s
X
p(s)
s
= <X 2 > − 2<X>2 + <X>2 = <X 2 > − <X>2 .
In the case of independent random variables X, Y , as defined above, the variance is
additive:
V [X + Y ] = V [X] + V [Y ] .
To see this, use (V 2) together with <XY > = <X><Y >:
V [X + Y ] = <(X + Y )2 > − (<X> + <Y >)2
= <X 2 > + 2<XY > + <Y >2 − <X>2 − 2<X><Y > − <Y >2
= <X 2 > − <X>2 + <Y 2 > − <Y >2 = V [X] + V [Y ] .
Example: again flip a coin 3 times, and let X be the number of tails.
<X 2 > =
1 2 3 2 3 2 1 2
0 + 1 + 2 + 3 =3
8
8
8
8
so V [X] = 3 − (3/2)2 = 3/4. If we let X = X1 + X2 + X3 , where Xi is the number of
tails (0 or 1) for the ith roll, then the Xi are independent variables with <Xi > = 1/2
and <Xi2 > = (1/2) · 1 + (1/2) · 0 = 1/2, so V [Xi ] = 1/2 − 1/4 = 1/4 (or equivalently
V [Xi ] = 1/2(1/2)2 + 1/2(−1/2)2 = 1/8 + 1/8 = 1/4). For the three rolls,
V [X] = V [X1 ] + V [X2 ] + V [X3 ] = 1/4 + 1/4 + 1/4 = 3/4 ,
confirming the result above.
3
INFO 2950, 18 Feb 16
Here’s a brief summary:
Expectation value: E[X] =
P
s∈S
p(s)X(s)
P
Variance: V [X] = s∈S p(s)(X(s) − E[X])2
= E[X 2 ] − (E[X])2
Standard deviation: σ[X] =
p
V [X]
For X a sum of random variable X =
P
E[X] = i E[Xi ]
P
i
Xi , the expectation always satisfies:
If (and only if) the variables X and Y are independent, then
E[XY ] = E[X]E[Y ]
If (and only if) all the variables Xi are independent, then
P
V [X] = i V [Xi ]
Example of coin flips (Xi = 1, 0 according to whether or not flip is heads)
For the ith coin flip , then
V [Xi ] = 1/2 − 1/4 = 1/4
Since they’re independent, for n such flips
E[X] = n/2
V [X] = n/4
√
σ[X] = n/2
Note that the fractional standard deviation
√
σ[X]/E[X] = 1/ n → 0 for large n
so the relative spread of the distribution goes to zero for a large number of trials
(the distribution becomes more tightly centered on the mean)
4
INFO 2950, 18 Feb 16
Bernoulli Trial
A Bernoulli trial is a trial with two possible outcomes: “success” with probability p,
and “failure” with probability 1 − p. The probability of r successes in N trials is
N r
p (1 − p)N −r .
r
PN
Note the correct overall normalization automatically follows from r=0 Nr pr (1−p)N −r =
N
p + (1 − p) = 1N = 1. The overall probability for r successes is a competition between
N
r
N −r
with is largest for small r when
r , which is maximum at r ∼ N/2, and p (1 − p)
p < 1/2 (or large r for p > 1/2).
In class, we considered the case of rolling a standard six-sided die, with a roll of 6
considered a success, so p = 1/6. (See figures on next page showing Nr pr (1 − p)N −r
for N = 1, 2, 4, 10, 40, 80, 160, 320 trials, with the number of successes r plotted along the
horizontal axis for each value of N .) For a larger number N of trials, the distribution of
expected number of successes becomes more narrowly peaked and more symmetrical about
a fractional distance r = N/6.
To analyze this in the framework outlined above, let the random variable Xi = 1 if
the ith trial is success. Then <Xi > = p. Let X = X1 + X2 + . . . + XN count the total
number of successes. Then it follows that the average satisfies
X
<X> =
<Xi > = N p .
(B1)
i
From V [Xi ] = <Xi2 > − <Xi >2 = p − p2 = p(1 − p), it follows that the variance satisfies
X
V [X] =
V [Xi ] = N p(1 − p) ,
(B2)
i
p
p
and the standard deviation is σ = V [X] = N p(1 − p). (Note that for p = 1/2 and
N = 3, this gives V [X] = 3/4, reproducing the result of the coin flip example above.)
This explains the observation that the probability gets more sharply peaked as the
number of trials increases, since
√ the width
√ of the distribution (σ) divided by the average
<X> behaves as σ/<X> ∼ N /N ∼ 1/ N , a decreasing function of N .
By the “central limit theorem” (not proven in class), many such distributions under
fairly relaxed assumptions always tend for sufficiently large number of trials to a “gaussian”
or “normal” distribution, of the form (as shown explicitly in lecture 22 notes)
(x−µ)2
−
2
1
P (x) ≈ √ e 2σ
.
(G)
σ 2π
R∞
R∞
This is properly normalized, with −∞ dx P (x) = 1, and also has −∞ dx xP (x) = µ,
R∞
dx x2 P (x) = p
σ 2 + µ2 , so the above distribution has mean µ and variance σ 2 . Setting
−∞
µ = N p and σ = N p(1 − p) for p = 1/6 in (G) thus gives a good approximation to the
distribution of successful rolls of 6 for large number of trials in the example above.
5
INFO 2950, 18 Feb 16
Probability of r sixes in 2 trials
1
0.8
0.8
0.6
0.6
Probability
Probability
Probability of r sixes in 1 trial
1
0.4
0.2
0.4
0.2
0
0
0
1
0
1
Number of sixes
Number of sixes
Probability of r sixes in 10 trials
0.5
0.5
0.4
0.4
0.3
0.3
Probability
Probability
Probability of r sixes in 4 trials
2
0.2
0.1
0.2
0.1
0
0
1
2
Number of sixes
3
0
4
0
1
2
3
Probability of r sixes in 40 trials
4
5
6
Number of sixes
7
8
9
10
Probability of r sixes in 80 trials
0.2
0.15
0.15
Probability
Probability
0.1
0.1
0.05
0.05
0
0
5
10
15
20
25
Number of sixes
30
35
0
40
0
10
Probability of r sixes in 160 trials
20
30
40
50
Number of sixes
60
70
80
240
280
320
Probability of r sixes in 320 trials
0.1
0.08
0.08
0.06
Probability
Probability
0.06
0.04
0.04
0.02
0.02
0
0
20
40
60
80
100
Number of sixes
120
140
160
0
0
40
80
120
160
200
Number of sixes
Related documents