Download Expectations, Variances, Covariances, and Sample Means

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Randomness wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Birthday problem wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability box wikipedia , lookup

German tank problem wikipedia , lookup

Bias of an estimator wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Expectations, Variances, Covariances, and Sample Means
Random Variables
Ex. I = Income, E = Education
I = 20K
I = 50K
I = 100K
E < 12 12 E < 16 E 16
10mil
5mil
0
5mil
15mil
5mil
0
0
5mil
Probability Distributions
Discrete: In this case, there is probability at each point. Formally, we say
that probability is described by a probability density function, f(t). With
f(t) = Pr(I=t), f(20000) = Pr(I=20000) =15/45 in the above table (of the
total population of 45 mil, 15 mil have this income).
Continuous: Ex. Many narrow income categories— probability eventually
becomes zero. In the continuous case, there is no probability at a point,
but instead there is probability "around" a point. In this case, a probability
density, f(x), describes probability as an area under the curve f(x). For
example:
Z
b
P r(a < X < b) =
[f (x)dx]
a
Expectations
De…nition: The expectation of a random variable may be viewed in some
sense as an ”average value”and is de…ned as:
P
x xf (x) for x discrete
E(X) = R
xf (x)dx for x continuous
x
Discrete Example. For the Income data above:
E(I) =
=
[15] 20K + [25] 50K + [5] 100K
45
15
45
20K +
25
45
50K +
5
45
100K = 45:56
From above, we can interpret E(I), the expectation of Income, as the average
income in the population of 45 mil. individuals. Notice that the above
calculation is made without reference to any other variables. In this case,
we sometimes refer to E(I) as a marginal expectation. In contrast, we might
want to know the expected value of income for someone with 16 or more
years of Education. We would denote this expectation as:
E (I j Education
16) :
We term this quantity as a conditional expectation. We calculate this quantity as before, except now we restrict attention to individuals with 16 or
more years of education. Namely:
E(I j Education
16) =
=
5
10
[0] 20 + [5] 50 + [5] 100
10
50K +
5
10
100K = 75K
This conditional expectation is to be interpreted as average income in the
population of individuals with at least 16 years of education.
2
Linear Properties: For random variables X and Y and constants a and b:
E(X + Y ) = E(X) + E(Y ); and E(aX + b) = aE(X) + b
Important Implication (Standardizing a Variable): Let E(X) = m.
Then:
E(X m) = E(X) m = m m = 0:
In this manner, if we standardize X by de…ning Z = X m, then E(Z) = 0:
We will frequently use this standardization (as you did implicitly in E322).
Variance
2
Def:Variance(X)
E [(X m)2 ] : Note the connection between
Var(X) and E(X). In both cases we are calculating the average value (in
the entire population) of something. In the case of E(X), we are calculating
the average value for X (in the entire population). In the case of Var(X),
we are calculating the average value of (X-m)2 in the population. Note that
the variance measures how far X is from its mean value (m) on average. In
calculating the variance, it is convenient to note that
E[ (X
m)2 ] = E X 2
= E(X 2 )
2Xm + m2 = E(X 2 )
2m2 + m2 = E(X 2 )
2mE(X) + m2
m2
E(X 2 )
[E(X)]2
By way of interpretation, the expected value of X is a weighted average and
is a measure of central tendency. The variance of X is a measure of the
”spread”in X values.
3
Property. Noting that E(X)
m ) E(aX + b) = am + b :
Var(aX + b) = E [(aX + b)
= E (aX
(am + b)]2
am)2 = a2 E (X
m)2 = a2 Var(X)
Important Implication (Completelly standardizing a variable):. Letting Z = [X E (X)]/ :
E(Z) = 0 and V ar(Z) = 1
We will often use such standardized variables in testing hypotheses.
Relations Between Random Variables
Measuring a Linear Relationship: Covariance:
Cov(X; Y )
E [(X E(X)) (Y E(Y ))]
= E(XY ) E(X) E(Y );
We will discuss in class why covariance measures linear relationships.
Important Property : Cov(X; Y) = 0 ) Var(X + Y) = Var(X) + Var(Y):
To see why this is the case, for simplicity let E(X) = E(Y ) = 0: Then for
cov(X; Y ) = 0 :
V(X+Y ) = Ef [(X + Y ) (0)]2 g = Ef [X]2 + 2XY + [Y ]2 g
= V (X) + 2cov(X; Y ) + V (Y ) = V(X) + V(Y ):
Remark: X, Y independent ) Cov(X; Y ) = 0
4
Sample Means
With N as the sample size, de…ne the sample mean of variables Xi as:
N
X
X
Xi =N
i=1
Expectation of a sample mean. Assuming E(Xi ) = m; i = 1; :::; N :
X
X
E(X) = E
Xi =N =
E (Xi ) =N = (N m) =N = m:
Variance of a sample mean. Assume Xi is identically distributed (All Xi
come from the same distribution, which implies that they all have the same
expectation and variance). Further assume that the Xi are all independent.
As we will often make these assumptions, we will adopt the convention of
referring to the Xi as being i.i.d. (identically and independently distributed)
in this case. Under these assumptions, the variance of a sample mean is
given as:
V(
N
X
i=1
N
N
X
1 X
1
1
V(
Xi ) = 2
V (Xi ) = 2 [ V (X1 ) + ::: + V (XN )]
Xi =N) =
2
N
N i=1
N
i=1
2
= N
2
=N 2 =
N
Notice how the variance declines as the sample size (N) increases.
Consistency. Viewing the sample mean as an estimate of the corresponding
population mean, it is important to note that a sample mean is "probably
close" to the population mean. As an example, suppose m = 0 and that
the random variables Xi are i.i.d. distributed as normal with expectation
m = 0 and variance 2 . Consider X̄(10) as a sample mean based on 10
observations and X̄(100) as a sample mean based on 100 observations. Note
5
that both of these sample means are random variables. To see that this is
the case and to examine distributions for these estimators, it will su¢ ce to
focus in X̄(100). Collect 100 i.i.d. observations (Xi ) and construct X̄1 (100)
as the sample in this …rst sample of 100 observations. Collect a second set of
100 observations and construct X̄2 (100) as the sample mean in these second
set. Continue in this manner to construct X̄3 (100); ....,X̄1000 (100); ::: These
are random variables with the value of each sample mean being a priori
unknown The distribution of the X̄j (100)0 s is shown by the solid curve in
the graph below.
In a similar manner, we can construct sample means based on 10 observations to obtain X̄1 (10); X̄2 (10); ::: The distribution of the X̄j (10)0 s is
shown by the dashed curve below. In both cases, these sample means have
expectation of m = 0. However, the variance in the large sample (100 observations) is much smaller than that in the small sample (10 observations).
You should be able to calculate the variance in each case. Notice that the
estimator is likely to be closer to the truth (m = 0 ) when the sample size is
larger. Indeed, as the sample size increases, the probability of the estimator
being "near" the truth increases. In this case, formally we would say that
the estimator converges in probability to the truth. We will not adopt this
formalism in this class, and instead simply say that in large samples the
estimator is likely to be close to the truth. When this happens, we will
refer to the estimator as being consistent.
6
Distributions of Sample Means
7