Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Expectations, Variances, Covariances, and Sample Means Random Variables Ex. I = Income, E = Education I = 20K I = 50K I = 100K E < 12 12 E < 16 E 16 10mil 5mil 0 5mil 15mil 5mil 0 0 5mil Probability Distributions Discrete: In this case, there is probability at each point. Formally, we say that probability is described by a probability density function, f(t). With f(t) = Pr(I=t), f(20000) = Pr(I=20000) =15/45 in the above table (of the total population of 45 mil, 15 mil have this income). Continuous: Ex. Many narrow income categories— probability eventually becomes zero. In the continuous case, there is no probability at a point, but instead there is probability "around" a point. In this case, a probability density, f(x), describes probability as an area under the curve f(x). For example: Z b P r(a < X < b) = [f (x)dx] a Expectations De…nition: The expectation of a random variable may be viewed in some sense as an ”average value”and is de…ned as: P x xf (x) for x discrete E(X) = R xf (x)dx for x continuous x Discrete Example. For the Income data above: E(I) = = [15] 20K + [25] 50K + [5] 100K 45 15 45 20K + 25 45 50K + 5 45 100K = 45:56 From above, we can interpret E(I), the expectation of Income, as the average income in the population of 45 mil. individuals. Notice that the above calculation is made without reference to any other variables. In this case, we sometimes refer to E(I) as a marginal expectation. In contrast, we might want to know the expected value of income for someone with 16 or more years of Education. We would denote this expectation as: E (I j Education 16) : We term this quantity as a conditional expectation. We calculate this quantity as before, except now we restrict attention to individuals with 16 or more years of education. Namely: E(I j Education 16) = = 5 10 [0] 20 + [5] 50 + [5] 100 10 50K + 5 10 100K = 75K This conditional expectation is to be interpreted as average income in the population of individuals with at least 16 years of education. 2 Linear Properties: For random variables X and Y and constants a and b: E(X + Y ) = E(X) + E(Y ); and E(aX + b) = aE(X) + b Important Implication (Standardizing a Variable): Let E(X) = m. Then: E(X m) = E(X) m = m m = 0: In this manner, if we standardize X by de…ning Z = X m, then E(Z) = 0: We will frequently use this standardization (as you did implicitly in E322). Variance 2 Def:Variance(X) E [(X m)2 ] : Note the connection between Var(X) and E(X). In both cases we are calculating the average value (in the entire population) of something. In the case of E(X), we are calculating the average value for X (in the entire population). In the case of Var(X), we are calculating the average value of (X-m)2 in the population. Note that the variance measures how far X is from its mean value (m) on average. In calculating the variance, it is convenient to note that E[ (X m)2 ] = E X 2 = E(X 2 ) 2Xm + m2 = E(X 2 ) 2m2 + m2 = E(X 2 ) 2mE(X) + m2 m2 E(X 2 ) [E(X)]2 By way of interpretation, the expected value of X is a weighted average and is a measure of central tendency. The variance of X is a measure of the ”spread”in X values. 3 Property. Noting that E(X) m ) E(aX + b) = am + b : Var(aX + b) = E [(aX + b) = E (aX (am + b)]2 am)2 = a2 E (X m)2 = a2 Var(X) Important Implication (Completelly standardizing a variable):. Letting Z = [X E (X)]/ : E(Z) = 0 and V ar(Z) = 1 We will often use such standardized variables in testing hypotheses. Relations Between Random Variables Measuring a Linear Relationship: Covariance: Cov(X; Y ) E [(X E(X)) (Y E(Y ))] = E(XY ) E(X) E(Y ); We will discuss in class why covariance measures linear relationships. Important Property : Cov(X; Y) = 0 ) Var(X + Y) = Var(X) + Var(Y): To see why this is the case, for simplicity let E(X) = E(Y ) = 0: Then for cov(X; Y ) = 0 : V(X+Y ) = Ef [(X + Y ) (0)]2 g = Ef [X]2 + 2XY + [Y ]2 g = V (X) + 2cov(X; Y ) + V (Y ) = V(X) + V(Y ): Remark: X, Y independent ) Cov(X; Y ) = 0 4 Sample Means With N as the sample size, de…ne the sample mean of variables Xi as: N X X Xi =N i=1 Expectation of a sample mean. Assuming E(Xi ) = m; i = 1; :::; N : X X E(X) = E Xi =N = E (Xi ) =N = (N m) =N = m: Variance of a sample mean. Assume Xi is identically distributed (All Xi come from the same distribution, which implies that they all have the same expectation and variance). Further assume that the Xi are all independent. As we will often make these assumptions, we will adopt the convention of referring to the Xi as being i.i.d. (identically and independently distributed) in this case. Under these assumptions, the variance of a sample mean is given as: V( N X i=1 N N X 1 X 1 1 V( Xi ) = 2 V (Xi ) = 2 [ V (X1 ) + ::: + V (XN )] Xi =N) = 2 N N i=1 N i=1 2 = N 2 =N 2 = N Notice how the variance declines as the sample size (N) increases. Consistency. Viewing the sample mean as an estimate of the corresponding population mean, it is important to note that a sample mean is "probably close" to the population mean. As an example, suppose m = 0 and that the random variables Xi are i.i.d. distributed as normal with expectation m = 0 and variance 2 . Consider X̄(10) as a sample mean based on 10 observations and X̄(100) as a sample mean based on 100 observations. Note 5 that both of these sample means are random variables. To see that this is the case and to examine distributions for these estimators, it will su¢ ce to focus in X̄(100). Collect 100 i.i.d. observations (Xi ) and construct X̄1 (100) as the sample in this …rst sample of 100 observations. Collect a second set of 100 observations and construct X̄2 (100) as the sample mean in these second set. Continue in this manner to construct X̄3 (100); ....,X̄1000 (100); ::: These are random variables with the value of each sample mean being a priori unknown The distribution of the X̄j (100)0 s is shown by the solid curve in the graph below. In a similar manner, we can construct sample means based on 10 observations to obtain X̄1 (10); X̄2 (10); ::: The distribution of the X̄j (10)0 s is shown by the dashed curve below. In both cases, these sample means have expectation of m = 0. However, the variance in the large sample (100 observations) is much smaller than that in the small sample (10 observations). You should be able to calculate the variance in each case. Notice that the estimator is likely to be closer to the truth (m = 0 ) when the sample size is larger. Indeed, as the sample size increases, the probability of the estimator being "near" the truth increases. In this case, formally we would say that the estimator converges in probability to the truth. We will not adopt this formalism in this class, and instead simply say that in large samples the estimator is likely to be close to the truth. When this happens, we will refer to the estimator as being consistent. 6 Distributions of Sample Means 7