Download The mean and the variance of the binomial distribution are n. where

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
The mean and the variance of the binomial distribution are
J.L
= nO
and
n
0-
2
= nO(l -
,
L (x -
n.
------tf"
-
0)
x=l
l)!(n
x)!
(1 - O)n-x
where we omitted the term corresponding to x = 0, which is 0, and canceled
the x against the first factor of x! = x(x -1)! in the denominator of (:). Then,
factoring out the factor n in n! = n(n
J.L = nO·
1)! and one factor 0, we get
t (: _i)
Ox-I (1 -
ot-
X
x=l
and, letting y = x - 1 and m = n -1, this becomes
since the last summation is the sum of all the values of a binomial distribution
with the parameters m and 0, and hence equal to 1.
To find expressions for J.L~ and 0- 2 , let us make use of the fact that
E(X2) = E[X(X 1)] + E(X) and first evaluate E[X(X -1)]. Duplicating for
and the book by H. G ..
at the end of
four decimal places for
use this table when e is
all practical purposes the steps used before, we thus get
Itrejrere~nC(~s
n
E[X(X -1)] = LX(x
x=o
n
,
L
= x=2
= n(n
5.5. For instance, to
Also, there ar~ several
when n is large; one of
1)e2 .
x
t (; =D
x 2
e - (1 - e)n-x
x=2
and, letting y
6.6.
=x
2 and m
E[X(X -1)]
of the binomial
distribution are
n.
_----:---:--_----:e x (1 - e)n(x - 2)!(n
x)!
=n -
2, this becomes
n(n -1)e2.
t
(~) e Y (1- e)m-y
y=o
= n(n
Therefore,
tL; =
E[X (X - 1)]
1)e2
+ E(X)
n(n - 1)e2 + ne
and, finally,
(12
tL; tL2
= n(n
1)e2
= ne(1- e)
+ ne -
n2 e 2
o
An alternative proof of this theorem, requiring much less algebraic detail, is
suggested in Exercise 5.6.
is 0, and canceled
I'V"'-Ul~"V' of (;). Then,
e, we get
ne
a binomial distribution
use of the fact that
- 1)]. Duplicating for
E(x) = ∑ x p(x) = ∑ x f(x) = ∑ x (1/N)
p(x=1) = π
p(x=0) = 1 – π
for n = 1
x
0
1
∑
f(x)
1- π
π
1
x f(x)
0
π
π = m’1
for r = 1, 2,3,4…
m’r = E(Xr) = π
m’1= E(X) = π
m’2= E(X2) = π
m2= E(X - π)2 = E(x2) – π2 = π - π2 = π (1 – π) = π q
m3= π q (1 – 2π)
m3 = E (x – π ) 3 = E(x3 - 3 x2 π + 3 x π 2 - π 3)
= m’3 - 3 m’1 m’2 + 3 m’13 - m’13
= m’3 - 3 m’1 m’2 + 2 m’13
=π-3π2+2π3
= π (1 – π) (1 – 2 π) = π q (1 – 2π)
2
m4 = 3 (nπ q) + n π q (1-6 π q)
(7.19)
)robability is given by
or numerically,
So it would seem that your chances of getting an A by random trial are not very
likely. How about settling for a C or better? If a C or better is to get a score of at
least 7, we can calculate that probability too. But before you rush off to do that,
wait a moment to think. Adding up all the probabilities from 7 to 20 is a pain; so
instead why not add up the probabilities from 0 to 6 and subtract from l ? The answer is
S o by "guessing" you have a 94% chance of getting a C or better! You have only a
6% chance of failing the course. Now you know why professional testers, such as the
Educational Testing Service in Princeton, deduct penalties for wrong answers-to penalize guessing.
Now that we have seen how to use our probability function to answer some simple
probability questions, let us explore the distribution itself and discover for ourselves
what role the parameters play. Figures 7.1 and 7.2 show the plots of the probabilities
for various values of n and rr against the value of the random variable. The figures illustrate the binomial probability function that we presented in Equation 7.9.
Let us look at the plots of the theoretical probabilities. In Figure 7.1, IT = .7, and
the probability distribution is left tailed. Notice in Figure 7.2 that when rr = .5, the
probability distribution is symmetric, but as the value of rr falls to n = .05 the symmetry decreases. The probability distribution for values of rr less than .5 is right
tailed. Notice that as the value of n increases the degree of asymmetry decreases.
Further, you may notice that as the value of n increases the peakedness of the probability distribution decreases. We will need some more tools to explore the reasons for
these observations; we will provide those tools in the next section.
229
GENERATING BINOMIAL PROBABILITIES
-
-
-
Value of K
n = 15;n=.7
'igure 7.1
Plot of a binomial probability distribution
5
10 15 20
Value of K
n=20;~=.5
0
10 20 30 40
Value of K
rr = 40: ~f = .5
5 10 15 20
Value of K
n=20;~=.2
0
5 10 15 20
Value of K
n = 20; n = .05
10 20 30 40
Value of K
n=40;n=.2
0
.0
0 20 40 60 80 100
Value of K
n = 120; n = .5
'igure 7.2
0 20 40 60 80 100
Value of K
11 = 120; ~f = .2
I
10 20 30 40
Value of K
n = 40; n = .05
I
I
I
I
I
0 20 40 60 80 100
Value of K
n = 120; T = .05
Comparison of binomial probability distributions with different parameters
-
1
We developed this theory to throw light on the empirical results that we obtained
in prcvious chapters, but if we were to examine some experiments using the binomial probability distribution would we confirm our theoretical findings? Table 7.2
summarizes the results from running each of nine experiments a thousand times.
The nine experiments were defined by setting the probability of a single success, n ,
to the values .5, .2, and .05 and by setting the number of trials, 11, to the values 20,
40, and 120.
Imagine that in the first experiment with n = .5 and n = 20 we ran the experiment
and got some number between 0 and 20 as an outcome. We then repeated that experiment 1000 times; on each run we would get an outcome that is a number between 0
and 20, which is the range for the random variable. For each setting for n and n we
get 1000 numbers that are obtained experimentally from a binomial probability distribution that has parameters n and n. Using these data we can calculate the various
sample moments that we exanlined in Chapter 4. We use the phrase "sample moments" advisedly to stress that the calculations are obtained from observing the outcomes of actual experiments.
Examining the wmmary statistics shown in Table 7.2, we see that the sample mean
increases with the value of n and with the value of n. The sample second moment
seems to behave In the same way. The standardized sarnpIe third moment, GI, is near
zero for .ir = .5. It increases as n gets smaller but declines as rr increases. The sample
fourth moment, G 2 , stays near the number 3.0, except for n = -05 at n = 20.
From the calculated sample moments, we appear to have discovered that as the
parameters are varied, both the shape of the distribution and the values taken by the
sample moments change in seemingly predictable ways. But can we claim that we
have confirmed our theory; that is, are the observed results in agreement with the theoretical claims? They seem to be approximately in agreement, but we d o not yet know
how to measure the extent of the difference, nor how to evaluate the difference, when
measured. We still have to discover how to decide what is and is not an example of
disagreement between theory and observation. That task will take us the remainder of
the text. For now, we need new tools to proceed.
Theoretical Moments and the Shape of the Probability Distribution
In previous chapters, we discussed shape and the role of sample moments in the context of histograms. Given that probability distributions are the theoretical analogues of
histograms, we should be able to extend the basic idea of a "moment" to probability
distributions.
Let us recall the simplest moment, the mean, both in terms of raw data and in terms
of data in cells:
where ''2''indicates that the right side is only approximately equaI to the left side,&
is the relative frequency in the jth cell, c, is the cell mark in the jth cell, and the n ob-
i
I
I
I
1
Table 7.2
Sample Moments from a Series of Binomial Dlstributions
n
20
40
120
20
40
120
20
40
120
K
.50
.50
.50
.20
.20
.20
.05
.05
.05
mi
m2
&I
10.05
20.21
59.96
4.07
8.10
24.18
1.02
2.00
5.93
4.39
9.56
27.80
3.53
6.63
19.02
0.88
2.01
5.80
-0.023
-0.028
-0.084
0.450
0.045
0.013
1.040
0.690
0.340
22
2.69
2.95
2.93
3.22
3.03
3.15
5.87
3.32
2.96
servations are in k cells. The first definition using raw data can also be interpreted in
terms of relative frequencies if we regard lln as the relative frequency of a single observation in a set of size n. We can abstract from this idea by noting that the sample
moments we defined previously are weighted sums of the values of the variable,
where the weights are given by the relative frequencies. Because probabilities are the
theoretical analogues of relative frequencies, we have an immediate suggestion for a
theoretical extension of the notion of "moment."
Let us define theoretical moments as the weighted sums of the values of the random variable, where the weights are given by the probability distribution. More formally, we write
to express the first theoretical moment about the origin for a discrete random variable
with n elements in the sample space and probabilities given by pr(Xi).
The use of the Greek letter p is to remind us that this definition is theoretical
and therefore to be distinguished from sample moments, which are defined with
respect to actual data. We have already discussed this important issue of notation,
but it bears repeating. In the chapters to come, you will frequently have to distinguish between observations, represented by lowercase Roman letters especially at
the end of the alphabet; random variables that have distributions attached, represented by uppercase Roman letters, and parameters and properties of probability
distributions, represented by Greek letters.
For example, if you see m', you know that it represents aclding up n observed numbers and dividing by n, the number of data points observed; m', is a quantity. That
means it is a number with units of measurement attached and is, through the observed
data, observed itself. In contrast, when you see /A', , you know that this is a theoretical
concept and is therefore an abstraction.
Random variables and probability distribution functions and theoretical moments
are concepts enabling us to think about what we are observing. What we actually observe and the operations we perform on those observations are different. We use our
theoretical ideas to interpret what we see. Force, momentum, velocity, and mass are
all theoretical ideas in physics used to explain, or interpret, what we see and feel. But,
a tennis ball hitting you smack in the face is reality; interpreting the event in tenns of
force, Inass, and velocity is the abstraction. The symbol ~ c ' , is a theoretical concept
used to interpret the reality of the quantity m', .
Let us return to developing our concepts of theoretical moments. We define the rth
theoretical moment about the origin as
where
is nothing more than the weighted sum of the values of a random variable
raised to the rth power; the weights are given by the corresponding probabilities.
When we dealt with moments of actual data, we found that the moments about the
mean were the most useful. Similarly in this case, we will discover that our most useful theoretical moments are monients about the "niean"; but what "niean"? A good
guess is to use p',; and indeed CL', is (as is M',) called the mean. By now you are getting the picture that the word "mean" has a lot of meanings!
We define theoretical monients about the ineaii by
where /A,- is the weighted sum of differences between the values of the random vanables and the theoretical mean raised to the r-th power; the weights are the corresponding probabilities.
We have introduced the idea of theoretical moments to help us measure the various
characteristics of the shape of probability distribution functions. We have done this in
analogy with our use of sample moments to measure the characteristics of histograms.
Let us now explore these new ideas using our first experimentally motivated probability function, the binomial probability function. Remember that our objective is to explain the change in shape of the binomial probability distribution in response to the
changes in the values of the parameters.
If we use the binomial probability distribution in the definition of p i , let us see
what we get:
where K = 0, 1, . . . , I ? are the values of the random variable represented in the defining equation for pi by X i ,i = 0, I, 2, . . . , n. The first term in this expression drops
out because K = 0:
<r
What we want to do now is to try to rewrite the last expression in terms of a binomial expansion from zero to jts upper limit as we did when we proved that the binomial probability distribution was a genuine probability distribution. But why? Because
that expansion gives us a value of 1, so that whatever is left over is what we want. If
we can reexpress the last expression in this way, we have solved our problem. Consider what happens in the last expression if we extract the product nrr from the summation:
n!
K=l (K - l)!(n
2
-
K)!
ir
(1 - 7ry-K
Equation 7.16 is now easily reexpressed as a sum of binomial probabilities,
which we know is one. In anticipation we have already reexpressed (n - K ) as
( n - 1 - (K - I)). We can sum K * = (K - 1) from 0 to (n - I ) , instead of summing K from I to tz. We can do this because (K - I), when K = 1, is 0 and is
(n - I ) when K = t z .
Consequently, we have a sum over K* of binomial probabilities from 0 to (I! - I ) ,
where the numerator in the combinatorial coefficient, (IF,;'),is (n - 1). As we now
know this sum is one. We conclude that the mean, or the first theoretical moment about
the origin, for the binomial probability distribution is nir. As we suspected from the enipirical results contained in Table 7.2 the mean is an increasing function of both n and i r .
In a similar manner. we can derive the higher theoretical niomenls of the binomial
probability distribution (you are shown how to do this in the Exercises for this chapter) as follows:
where q = (1 - ir);
and
Before we start to use these theoretical moments to represent the shape of the probability distribution, we recognize from our experience in Chapter 6 that it is probably
important to standardize the theoretical moments beyond the second to allow for both
origin and scale effects, just as we had to do for the higher sample moments.
Following the precedent set in previous chapters, we define
where a, and a2 are the standardized theoretical moments in the sense that any
change in the origin for measuring the random variable or in the chosen scale for
measuring the random variable will leave the value of the standardized third and
fourth theoretical moments invariant. For example, if the random variable represents
temperature readings, then the values of a1 and a2 are the same no matter whether the
temperature readings are recorded in degrees Fahrenheit or degrees Celsius.
Notice that the "little hats" are gone from al and a2. This is because we are now
dealing with the theoretical entities and not their sample analogues. If we were to use
our usual practice, we would have written a1 and a2 for the sample standardized moments in agreement with using m, for p,, but as you now see that would have been an
unfortunate choice given our use of a for so many other terms. This trick of using little hats to indicate that we are talking about the sample analogues, not the theoretical
quantities, will be used quite extensively in the next few chapters.
Let us write down our first two theoretical moments, followed by our standardized
third and fourth theoretical moments for the binomial probability distribution:
p', = n x ;
~2
= nrrq
where q = (1 - rr);
We can now explore the relationship between theoretical moments, parameters,
and the shape of the probability function. Let us review the summary calculations
I
51
Table 7.3
GENERATING BINOMIAL PROBABILITIES
235
1
Theoretical Moments for the Binomial Distrlbutlon for Various Parameters
Parameter Values
n
20
20
20
40
40
40
120
120
120
n
.50
.20
.05
.50
.20
.05
.50
.20
.05
Theoretical Moments
LL~
PZ
10
4
1
20
8
2
60
24
6
5.00
3.20
0.95
10.00
6.40
1.90
30.00
19.20
5.70
012
0.00
0.33
0.92
0.00
0.24
0.65
0.00
0.13
0.35
3.15
3.26
4.00
3.08
3.13
3.50
3.03
3.04
3.17
shown in Table 7.3 and the plots of the theoretical probabilities shown in Figure 7.2
to see the extent to which our theoretical results are useful. For the nine alternative
combinations of n and n that we have examined, we should conipare the theoretical
moments with the values obtained for the sample moments using the experimental
data summarized in Table 7.2.
On comparing the theoretical values with the values obtained from the experiments, we observe a close, but by no means exact relationship, between the two sets
of numbers. Thinking about these numbers and their comparisons, we may realize
that a major problem that faces us is how to decide what is and is not close. How do
we decide when an empirical result "confirms" the theoretical statement, given that
we know that it is most unlikely that we will ever get perfect agreement between theory and observation? This is an important task that will have to be delayed until
Chapter 9.
The values for the theoretical moments listed in Table 7.3 and the expressions that
we derived for the theoretical moments enable us to confirm the conjectures that we
formulated on first viewing the summary statistics in Table 7.2.We see even more
clearly from the algebraic expressions that pi increases in 11 and n ;that p 2 increases
in rz and n up to .5;that the standardized third and fourth theoretical moments decrease in n to minimum values that are 0.00 and 3.03, respectively. As n approaches 0
or I both crl and a 2 get bigger for any given value of 11; alternatively expressed, the
minimum for both I crl l and crz is for n = .5.
The effect that n has 011 the shape of the distribution is quite remarkable. No matter
ho\v asynunetric the distribution is to begin with, a sufiiciently large increase in n will
produce a distribution that is very synunetric; the symmetry is measured by the value of
c r ~ The
.
limit as n -+ oo for crl is 0, and the limit for cr2 is 3. We will soon see that these
two values for crl and keep occurring as the limits for the values of crl and cr2 as a parameter n gets bigger and bigger. We might suspect that these particular values for the
theoretical moments represent a special probability distribution, as indeed they do.
Consider a simple example. Suppose that the distribution of births by sex within a
specified region over a short interval of time is given by the binomial distribution with
parameters 11, being the total number of births in the region, and n ,the probability of a
male birth. Usually, the value of n is n little greater than .5,say .52. If 11 = 100, the
mean of the distribution, p', = nx = 52. What this means is that the value about which
actual numbers of male births v i ~ i e sis 52; for any given ti~neperiod it might be less, or
it might be more, but over n long period we expect the variations to average approxiniately 52.
The square root of the variance of the distribution is given by J n n ( l - IT),which
in this case is 5. We can expect very approximately that about 90% of the t i ~ n ethe
nunlber of male births will lie between 42 and 62. Certainly, we would be very surprised if the number of male births fell below 37 or exceeded 67.
Given that the standardized third liioment is negative, but very smail, this distribution
is approxin~atelysymmetric about 52; it is equi~llylikely to have fewer than 52 male
births as it is to have more. The standardized fourth moment is not too much less than 3,
so that we call conclude that our distribution has neither thin tails nor Cat ones. Very
large and very s~nallnumbers of male births are neither very frequent nor very rare.
We can summarize this section by pointing out that what we have discovered is that
the conditions of an experiment determine the shape of the relevant probability
function.
the effect.\ of the experimental conditions are reflected in the values taken by
the parameters of the distribution.
the shape of the distribution is measured in terms of the theoretical moments of
the probability function.
the theoretical moments are then~selvesfunctions of the parameters.
Expectation
This section is not what you might expect; it is about a very useful operation in statistics called taking the expected value of a random variable, or expectation. You have
in fact already done it-that is, taken an expectation of a random variable-but you
did not kno\v it at the time. The operation that we used to determine the theoretical
moments was an example of the procedure known as expectation. Once again, we will
be able to obtain both insight into and ease of manipulation of random variables and
their properties by generalizing a specific concept and thereby extending our theory.
Taking expected values is nothing more than getting the weighted sums of functions of random variables, where the weights are provided by the probability function.
Let us define the concepl of expectation formally as follows:
The expected value of any function of a random variable, X, say g(X), is given by
We have already pel-formed this operation several times while calculating the
theoretical moments. For p',, the function g(X) was just X; for p~ the function g(X)
was (X and so on. The function g(X) can be any function you like. Whatever choice you make, the expectation of g(X), however defined, is given by Equation 7.23.
Consider the simple coin toss experiment with probability function discussed in
Chapter 6. The distribution function was
The expectation of X,
takes the value in the case of a true coin toss. Note that I used the notation Ex to
indicate "summation over the values of X."
What does this simple result mean? Clearly, it does not mean that in a coin toss one
observes
on average. All one obsei-ves is a string of 0s and Is, not a single in
sight. What the "average," or "mean," of the variable given by the expectation operation implies is that it is the value for which the average deviation is zero. Recall from
Chapter 3 that we define the sample mean as that value such that the average deviations from it are zero; that is
"i"
Similarly, we note that EL', satisfies
where
Recalling our definition of a random variable (which is a function on all subsets
of a sample space to the real numbers), we recognize that ,p(X) is also a random
variable. This is because it is a function of a random variable that is, in turn, a
function of subsets of some sample space, so that g(X),through X, is itsclf a random variable. As such g(X) will have associated with it a corresponding probability function, say h(Y), where Y = g(X). For the student comfortable in calculus
this approach is explored in the Exercises in terms of the concept lermed "transformation of variables."
But we have also seen that ally probability function has associatetl with it a set of
theoretical moments. The first of these is the mean, or p',. From the defini~ionof p', ,
p; =
C Y;h(Y;)
(7.24)
where Y; = g(X;) and h(Y;) is the probability function for the random variable Yi.
Let's make sure that we understand what is going on here. We began with a
random variable Xi, i = 1 , 2 , . . . , n , with corresponding probability function f(Xi).
We then decided to create a new random variable Y, with values Y,, by defining
We have claimed that there is some
Y;= g(X;), for example, Y; = ( X i probability function, say h(Y;), that provides the distribution of probabilities for
the random variable Y. At the moment we do not know how to find the probability
238
CHAPTER 7
DISCRETE PROBABILITY DISTRIBUTIONS
function h(Yi), but because we know that Y is a random variable, we d o know that
there exists sonie probability function for the random variable Y: wc are just giving
it a name.
S o if we knew the probability function lt(Yi), we could obtain the mean of the
random variable Y. But that is often a problem; it is sometimes not casy to get the
probability function h(Y,) frorn inform~itionon X,, g(Xi), and f(X,). Hcre is where the
operation of expectation comes to the rescue. Using the expression for expectation,
we I~ave
where pr(X,) is thc probability function for the random variable X. From this expression, we see that we can get the mean of the random variable Y = g(X) without ha\!ing to find out what the probability fiinction Ir(Y) is. We have in fact the result
All the theoretical niomellts are examples of expeclations; w', is the expectation of
the random variable X; b 2 ( X ) is the expectation of the randorn variable (X - w ' , ) ~ ;
and so on. As we know, all of these theoretical moments are merely weighted sums of
the values of the appropriate function of the randorn variable X. The weight5 are given
by the probabilities associated with cach value of X or u.ith each value of some function of X.
You might get a better feel for the idea of expectation if we consitler a case where
the number of variable valuea is two:
and n is the probability that X = X I and (I - n ) is the probability that X = X2.
Suppose that X I = 1 , X2 = 2, and that we let n have various values. As an example. you might be contemplating an investment opportunity in which there are two
possible outcornes: $1,000 in one case and $2,000 in the other; the probability of the
first case occurring is represented by n .
Let us try n = I ,
$, and 0. For each of these values of n ,recall that (I - n)
will have values 0, $,
and 1. We can write down for each choice of n the corresponding value of the expectation
:, i,
i, a,
Table 7.4
Expectations of Various Functlons for Binomial Probability Functions
Expectations for
Function
n=l
n=+
n = l 2-
~~1
x=O
g: x2
1
4
7
3
+?
4
f x f 5
6
7
9
0
$
9
9
3
$
5
f
?-
8
1
+
h: 2 3x
k: ( x - 1)'
2
We see that as we shift the weight of the probability from being all on the va~ukL41''
to being all on the value "2," the expectation shifts from the value 1 to the value 2 correspondingly. The value of the expectation is always between 1 and 2 in these examples. If you try doing this for probabilities with three values for the random variable
and then for random variables with four values, you will soon see that as we allow the
probability to vary over its whole possible range that the value for the expectation for
any probability function must lie between the minimum and the maximum observations; that is, if X ( , ) and X $ ) are the minimum and the maximum, respectively, of a
random variable and pr(X) is the probability function, then,
The smallest E ( X ) can be is XO), when the probability of X(,, is one and, of
course, the probabilities of all other values are 0. The largest that E ( X ) can be is
X g ) , when the probability of X(,,) is one and the probabilities of all the other values
are 0.
Similar statements can be made about the expectation of any function of a random
variable X . With the five alternative probability distribution functions used hl the example where X I = 1 and X 2 = 2 , try obtaining for yourself the expectations of
You should get the results shown in Table 7.4.
Notice that there are some simple relationships between the various expectations
and the functions g(.), f (.), A ( . ) , and k ( . ) . It would be convenient to formalize this
observation. Consequently, consider the following claim: If a random variable Y is related to a random variable X by
for a and b, which can be any constants, then
You can easily demonstrate this for yourself by substituting the definition of Y into
the definition for expectation:
What if we define Y by
where the ( X I )are a Fet of 11 random variables. Suppose that the expectations of the
X,, i = 1,2. . . . , ri are C I , ~ 2 . ,. . , e,, respectively; that is, E{X,) = 6 , . What is the expectation of Y in terms of the E(Xl]?
By a slight extension of the preceding argument we see that
where pr(Xl)pr(X2). . . pr(X,,) is the joint probability function for the ri randonl variables ( X I ) ,i = 1 , 2 , . . . , n , assuming that the ( X I )are statistically independent.
Applying the elementary probability concepts that we learned in Chapter 6, we see that
where Xjj represents the jth value of the ith random variable. So
In the chapters to come we will use these concepts and results repeatedly, so make
sure that you thoroughly understand these ideas.
Let us try a simple example using the random variables XI and X2, where
E ( X l ) = 3, E(x:] = 9, and E ( X 2 ) = 4 . Let us assume that we know nothing further about the distributions of XI and X2. TO find the expectation of
and of
we obtain by sllbstitution
E(Y)=2x3+5x4
= 26
E(CV) = E(x:] + 4 - E ( 4 X l )
=9+4-(4x3)=1
In Equation 7.30, we defined the expectation of a "sum of variables" with respect
to random variables that were independently distributed. Let us now extend that definition to variables that are not independently distributed. Let F(XI, X2) be the joint
distribution of XI, X? where F(XI, X2) cannot be factored into the product of the marginal~.Consider:
111 this development, we used the result that the marginal distribution is obtained
from the joint distribution by adding up the probabilities over the "other variable" (as
implies
was discussed in Chapter 6); C x 2 F( X I , X2) = FI(X I ) . The notation
that one sums over the values of both XI and X2. In the previous expression, we used
the idea that
where FI(XI)is the marginal distribution of XI.
We can easily extend this idea to weighted sums of funclions of the variables, one
at a time. For example,
So far, it would seem that there is no difference between the independent and the
nonindependent cases. Thai is because we have not looked at any functions invol\~ing
the protlucts of XI and X2. Consider, using the same distribution:
No further simplification is possible. However, if the variables XI and X2 are independent, the situation is different:
Let us explore this result using the simple bivariate distribution defined in Chapter
6. Table 6.4 gave the joint distribution of XI = (0, 1) and X2 = (- 1,0, I ) . We obtain
where all other terms have dropped out because zeros are involved in the multiplication. In contrast, consider the expectation of the product of the independent random
variables (Yl, Y2] that were defined in Table 6.6 and have the same marginal distributions as those of (XI, X2}.
after having dropped all terms involving multiplication by zero. So even for two distributions where the values taken by the random variables are the same and the marginal probabilities are the same, there is a considerable difference in the expectation
of a product between variables that are independent and nonindependent.